GH-46369: [C++] Add key-value pairs to the FileSystemFactory interface#50044
GH-46369: [C++] Add key-value pairs to the FileSystemFactory interface#50044raulcd wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the C++ filesystem factory plumbing to accept an additional set of backend-specific key/value options (in addition to the filesystem URI), aligning with GH-46369’s goal of separating reusable URIs from user-specific configuration.
Changes:
- Extends
FileSystemFactoryto acceptoptionsand adds compatibility construction for existing factories. - Adds new
FileSystemFromUrioverloads that acceptoptions, and threads them through the factory registry call path. - Updates registered filesystem modules/tests (examplefs, S3, localfs tests) to compile with the new factory signature.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/src/arrow/testing/examplefs.cc | Updates example filesystem factory signature and forwards options to nested FileSystemFromUri call. |
| cpp/src/arrow/filesystem/s3fs.cc | Updates S3 registered factory signature to include options (currently unused). |
| cpp/src/arrow/filesystem/localfs_test.cc | Disambiguates nullptr factory registrations under the new overloaded FileSystemFactory constructors. |
| cpp/src/arrow/filesystem/filesystem.h | Introduces options-aware FileSystemFactory + new FileSystemFromUri overloads and documentation. |
| cpp/src/arrow/filesystem/filesystem.cc | Implements the new FileSystemFromUri overloads and forwards options into the registry-based factory dispatch. |
Comments suppressed due to low confidence (1)
cpp/src/arrow/filesystem/filesystem.cc:925
- The new
optionsargument is only forwarded to registered factories; built-in scheme handling (abfs/gcs/hdfs/mock) ignores it entirely. This makesFileSystemFromUri(..., options, ...)silently drop backend-specific options for these schemes. Consider either (a) plumbingoptionsinto the built-in scheme option parsing / constructors, or (b) rejecting non-emptyoptionsfor schemes that don’t support them yet to avoid surprising behavior.
Result<std::shared_ptr<FileSystem>> FileSystemFromUriReal(
const Uri& uri, const std::string& uri_string,
const std::vector<std::pair<std::string, std::any>>& options,
const io::IOContext& io_context, std::string* out_path) {
const auto scheme = uri.scheme();
{
ARROW_ASSIGN_OR_RAISE(
auto* factory,
FileSystemFactoryRegistry::GetInstance()->FactoryForScheme(scheme));
if (factory != nullptr) {
return factory->function(uri, options, io_context, out_path);
}
}
if (scheme == "abfs" || scheme == "abfss") {
#ifdef ARROW_AZURE
ARROW_ASSIGN_OR_RAISE(auto options, AzureOptions::FromUri(uri, out_path));
return AzureFileSystem::Make(options, io_context);
#else
return Status::NotImplemented(
"Got Azure Blob File System URI but Arrow compiled without Azure Blob File "
"System support");
#endif
}
if (scheme == "gs" || scheme == "gcs") {
#ifdef ARROW_GCS
ARROW_ASSIGN_OR_RAISE(auto options, GcsOptions::FromUri(uri, out_path));
return GcsFileSystem::Make(options, io_context);
#else
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Construct from a non-options aware factory function maintaing source compatibility | ||
| /// with existing factories. |
| /// \brief Create a new FileSystem by URI with extended backend-specific filesystem | ||
| /// options | ||
| /// | ||
| /// Recognized schemes are "file", "mock", "hdfs", "viewfs", "s3", | ||
| /// "gs" and "gcs". | ||
| /// | ||
| /// Support for other schemes can be added using RegisterFileSystemFactory. | ||
| /// | ||
| /// \param[in] uri the URI to give access to | ||
| /// \param[in] options a list of backend-specific filesystem options | ||
| /// Each option is a (name, value) pair. | ||
| /// The expected type is specific to the backend and | ||
| /// option name. | ||
| /// \param[out] out_path (optional) Path inside the filesystem. | ||
| /// \return out_fs FileSystem instance. | ||
| ARROW_EXPORT | ||
| Result<std::shared_ptr<FileSystem>> FileSystemFromUri( | ||
| const std::string& uri, const std::vector<std::pair<std::string, std::any>>& options, | ||
| std::string* out_path = NULLPTR); |
| Result<std::shared_ptr<FileSystem>> FileSystemFromUri( | ||
| const std::string& uri, const std::vector<std::pair<std::string, std::any>>& options, | ||
| std::string* out_path = NULLPTR); |
There was a problem hiding this comment.
I am unsure whether we want to be purely additive here and drop this for the case of previous external usages for FileSystemFromUri(const std::string&, {}, std::string*) or keep it with a documentation note.
| const std::vector<std::pair<std::string, std::any>>& options, | ||
| const io::IOContext& io_context, | ||
| std::string* out_path) -> Result<std::shared_ptr<fs::FileSystem>> { | ||
| RETURN_NOT_OK(EnsureS3Initialized()); |
| Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string& uri_string, | ||
| std::string* out_path) { | ||
| return FileSystemFromUri(uri_string, io::default_io_context(), out_path); | ||
| return FileSystemFromUri(uri_string, /*options=*/{}, io::default_io_context(), | ||
| out_path); | ||
| } | ||
|
|
||
| Result<std::shared_ptr<FileSystem>> FileSystemFromUri( | ||
| const std::string& uri_string, | ||
| const std::vector<std::pair<std::string, std::any>>& options, std::string* out_path) { | ||
| return FileSystemFromUri(uri_string, options, io::default_io_context(), out_path); | ||
| } | ||
|
|
||
| Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string& uri_string, | ||
| const io::IOContext& io_context, | ||
| std::string* out_path) { | ||
| return FileSystemFromUri(uri_string, /*options=*/{}, io_context, out_path); | ||
| } | ||
|
|
||
| Result<std::shared_ptr<FileSystem>> FileSystemFromUri( | ||
| const std::string& uri_string, | ||
| const std::vector<std::pair<std::string, std::any>>& options, | ||
| const io::IOContext& io_context, std::string* out_path) { | ||
| ARROW_ASSIGN_OR_RAISE(auto fsuri, ParseFileSystemUri(uri_string)); | ||
| return FileSystemFromUriReal(fsuri, uri_string, io_context, out_path); | ||
| return FileSystemFromUriReal(fsuri, uri_string, options, io_context, out_path); | ||
| } |
|
@pitrou @kou I've started working on this to hopefully move S3 filesystem out of libarrow and into its own shared library. For the scope of the S3 options (on this PR or a follow up one), would you mimic all S3Options as part of the new k/v pair or only a small subset, example initially only region and credentials? |
TBD
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)