Skip to content

GH-46369: [C++] Add key-value pairs to the FileSystemFactory interface#50044

Draft
raulcd wants to merge 1 commit into
apache:mainfrom
raulcd:GH-46369
Draft

GH-46369: [C++] Add key-value pairs to the FileSystemFactory interface#50044
raulcd wants to merge 1 commit into
apache:mainfrom
raulcd:GH-46369

Conversation

@raulcd
Copy link
Copy Markdown
Member

@raulcd raulcd commented May 26, 2026

TBD

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

Copilot AI review requested due to automatic review settings May 26, 2026 14:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the C++ filesystem factory plumbing to accept an additional set of backend-specific key/value options (in addition to the filesystem URI), aligning with GH-46369’s goal of separating reusable URIs from user-specific configuration.

Changes:

  • Extends FileSystemFactory to accept options and adds compatibility construction for existing factories.
  • Adds new FileSystemFromUri overloads that accept options, and threads them through the factory registry call path.
  • Updates registered filesystem modules/tests (examplefs, S3, localfs tests) to compile with the new factory signature.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cpp/src/arrow/testing/examplefs.cc Updates example filesystem factory signature and forwards options to nested FileSystemFromUri call.
cpp/src/arrow/filesystem/s3fs.cc Updates S3 registered factory signature to include options (currently unused).
cpp/src/arrow/filesystem/localfs_test.cc Disambiguates nullptr factory registrations under the new overloaded FileSystemFactory constructors.
cpp/src/arrow/filesystem/filesystem.h Introduces options-aware FileSystemFactory + new FileSystemFromUri overloads and documentation.
cpp/src/arrow/filesystem/filesystem.cc Implements the new FileSystemFromUri overloads and forwards options into the registry-based factory dispatch.
Comments suppressed due to low confidence (1)

cpp/src/arrow/filesystem/filesystem.cc:925

  • The new options argument is only forwarded to registered factories; built-in scheme handling (abfs/gcs/hdfs/mock) ignores it entirely. This makes FileSystemFromUri(..., options, ...) silently drop backend-specific options for these schemes. Consider either (a) plumbing options into the built-in scheme option parsing / constructors, or (b) rejecting non-empty options for schemes that don’t support them yet to avoid surprising behavior.
Result<std::shared_ptr<FileSystem>> FileSystemFromUriReal(
    const Uri& uri, const std::string& uri_string,
    const std::vector<std::pair<std::string, std::any>>& options,
    const io::IOContext& io_context, std::string* out_path) {
  const auto scheme = uri.scheme();

  {
    ARROW_ASSIGN_OR_RAISE(
        auto* factory,
        FileSystemFactoryRegistry::GetInstance()->FactoryForScheme(scheme));
    if (factory != nullptr) {
      return factory->function(uri, options, io_context, out_path);
    }
  }

  if (scheme == "abfs" || scheme == "abfss") {
#ifdef ARROW_AZURE
    ARROW_ASSIGN_OR_RAISE(auto options, AzureOptions::FromUri(uri, out_path));
    return AzureFileSystem::Make(options, io_context);
#else
    return Status::NotImplemented(
        "Got Azure Blob File System URI but Arrow compiled without Azure Blob File "
        "System support");
#endif
  }
  if (scheme == "gs" || scheme == "gcs") {
#ifdef ARROW_GCS
    ARROW_ASSIGN_OR_RAISE(auto options, GcsOptions::FromUri(uri, out_path));
    return GcsFileSystem::Make(options, io_context);
#else

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +378 to +379
/// Construct from a non-options aware factory function maintaing source compatibility
/// with existing factories.
Comment on lines +575 to +593
/// \brief Create a new FileSystem by URI with extended backend-specific filesystem
/// options
///
/// Recognized schemes are "file", "mock", "hdfs", "viewfs", "s3",
/// "gs" and "gcs".
///
/// Support for other schemes can be added using RegisterFileSystemFactory.
///
/// \param[in] uri the URI to give access to
/// \param[in] options a list of backend-specific filesystem options
/// Each option is a (name, value) pair.
/// The expected type is specific to the backend and
/// option name.
/// \param[out] out_path (optional) Path inside the filesystem.
/// \return out_fs FileSystem instance.
ARROW_EXPORT
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
const std::string& uri, const std::vector<std::pair<std::string, std::any>>& options,
std::string* out_path = NULLPTR);
Comment on lines +591 to +593
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
const std::string& uri, const std::vector<std::pair<std::string, std::any>>& options,
std::string* out_path = NULLPTR);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether we want to be purely additive here and drop this for the case of previous external usages for FileSystemFromUri(const std::string&, {}, std::string*) or keep it with a documentation note.

const std::vector<std::pair<std::string, std::any>>& options,
const io::IOContext& io_context,
std::string* out_path) -> Result<std::shared_ptr<fs::FileSystem>> {
RETURN_NOT_OK(EnsureS3Initialized());
Comment on lines 963 to 987
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string& uri_string,
std::string* out_path) {
return FileSystemFromUri(uri_string, io::default_io_context(), out_path);
return FileSystemFromUri(uri_string, /*options=*/{}, io::default_io_context(),
out_path);
}

Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
const std::string& uri_string,
const std::vector<std::pair<std::string, std::any>>& options, std::string* out_path) {
return FileSystemFromUri(uri_string, options, io::default_io_context(), out_path);
}

Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string& uri_string,
const io::IOContext& io_context,
std::string* out_path) {
return FileSystemFromUri(uri_string, /*options=*/{}, io_context, out_path);
}

Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
const std::string& uri_string,
const std::vector<std::pair<std::string, std::any>>& options,
const io::IOContext& io_context, std::string* out_path) {
ARROW_ASSIGN_OR_RAISE(auto fsuri, ParseFileSystemUri(uri_string));
return FileSystemFromUriReal(fsuri, uri_string, io_context, out_path);
return FileSystemFromUriReal(fsuri, uri_string, options, io_context, out_path);
}
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 26, 2026
@raulcd
Copy link
Copy Markdown
Member Author

raulcd commented May 26, 2026

@pitrou @kou I've started working on this to hopefully move S3 filesystem out of libarrow and into its own shared library.
About the scope of this PR, would you push this plumbing PR in isolation (in which case I would add a couple of tests to validate options are passed), or should I add some S3 specific options as part of this PR?

For the scope of the S3 options (on this PR or a follow up one), would you mimic all S3Options as part of the new k/v pair or only a small subset, example initially only region and credentials?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants