GH-32123: [R] Expose azure blob filesystem#49553
GH-32123: [R] Expose azure blob filesystem#49553marberts wants to merge 51 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
|
The CI is failing because the Azure C++ SDK depends on libxml2.
|
|
It looks like the failing CI job is from a warning in R CMD check about a non-API call to a C function, due to all the changes in R-devel. I don't think this is due to our PR. At this point, I'm fairly confident that we've implemented the Azure features correctly. The only thing I'm not sure about is the setup in @jonkeane, @thisisnic, @assignUser Please let me or @Collinbrown95 know if there are any changes needed to this PR. |
|
Thanks for the PR @marberts - if you rebase from the main branch now, the non-API call stuff should be resolved. Your best bet for the This PR where the GCS bindings were added might help: #13404, but let us know if you have any questions! |
|
We'll also need additional CI jobs to test this - the PR I linked to above contains examples of what we added there. |
|
Great, thanks @thisisnic! |
|
To do list.
|
|
@github-actions crossbow submit -g r |
|
|
@github-actions crossbow submit -g r |
|
Revision: 4da7280 Submitted crossbow builds: ursacomputing/crossbow @ actions-92325d6543 |
|
@github-actions crossbow submit -g r |
|
|
Some of those CI failures are from things which have been fixed on main - you'll need to rebase your branch. |
The hypothesis is that the errors in the build logs are caused by incompatibilities between WIL (Windows Implementation Library) and MinGW GCC, so install them directly instead of trying to build from source.
|
It seems I made a mess of the rebase, although I'm not entirely sure how 😞. |
c620119 to
bb44e20
Compare
Rationale for this change
This PR adds support for Azure. The Arrow R package already has support for AWS and GCS, and the Arrow C++ library has had support for Azure for a couple years now. Support for Azure is already available in pyarrow.
This would close #32123.
What changes are included in this PR?
A new class
AzureFileSystemthat's analogous toS3FileSystem/GcsFileSystem, along with a helper functionaz_container()that's analogous tos3_bucket()/gcs_bucket().Updates to
src/filesystem.cppto interact with the machinery inarrow/filesystem/azurefs.h.Updates to the configuration and build scripts to support building with support for Azure.
Updates to the vignettes on cloud storage, installation, and developer setup.
Are these changes tested?
Yes. See
tests/testthat/test-azure.R.Are there any user-facing changes?
Yes. There is a new function
az_container(), serving the analogous role tos3_bucket()/gcs_bucket(), along with an R6 classAzureFileSystem, again serving the same role asS3FileSystemandGcsFileSystem. There is also a functionarrow_with_azure()to indicate if Arrow was built with support for Azure.