diff --git a/docs/01_introduction/index.mdx b/docs/01_introduction/index.mdx index 072d1f3d..ab1ae5a7 100644 --- a/docs/01_introduction/index.mdx +++ b/docs/01_introduction/index.mdx @@ -39,7 +39,7 @@ pip install apify-client ## Quick example -Here's an example showing how to run an Actor and retrieve its results: +The following example shows how to run an Actor and retrieve its results: diff --git a/docs/01_introduction/quick-start.mdx b/docs/01_introduction/quick-start.mdx index 94918cf5..d8feaa16 100644 --- a/docs/01_introduction/quick-start.mdx +++ b/docs/01_introduction/quick-start.mdx @@ -8,6 +8,8 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import AuthAsyncExample from '!!raw-loader!./code/02_auth_async.py'; import AuthSyncExample from '!!raw-loader!./code/02_auth_sync.py'; import InputAsyncExample from '!!raw-loader!./code/03_input_async.py'; @@ -44,9 +46,9 @@ The API token is used to authorize your requests to the Apify API. You can be ch ## Step 2: Run an Actor -To start an Actor, call the [`apify_client.actor()`](/reference/class/ActorClient) method with the Actor's ID (e.g., `john-doe/my-cool-actor`). The Actor's ID is a combination of the Actor owner's username and the Actor name. You can run both your own Actors and Actors from [Apify Store](https://apify.com/store). +To start an Actor, call the `apify_client.actor()` method with the Actor's ID (e.g., `john-doe/my-cool-actor`). The Actor's ID is a combination of the Actor owner's username and the Actor name. You can run both your own Actors and Actors from [Apify Store](https://apify.com/store). -To define the Actor's input, pass a dictionary to the [`call()`](/reference/class/ActorClient#call) method that matches the Actor's [input schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema). The input can include URLs to scrape, search terms, or other configuration data. +To define the Actor's input, pass a dictionary to the `call()` method that matches the Actor's [input schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema). The input can include URLs to scrape, search terms, or other configuration data. @@ -63,7 +65,7 @@ To define the Actor's input, pass a dictionary to the [`call()`](/reference/clas ## Step 3: Get results from the dataset -To get the results from the dataset, call the [`apify_client.dataset()`](/reference/class/DatasetClient) method with the dataset ID, then call [`list_items()`](/reference/class/DatasetClient#list_items) to retrieve the data. You can get the dataset ID from the Actor's run dictionary (represented by `defaultDatasetId`). +To get the results from the dataset, call the `apify_client.dataset()` method with the dataset ID, then call `list_items()` to retrieve the data. You can get the dataset ID from the Actor's run dictionary (represented by `defaultDatasetId`). diff --git a/docs/02_concepts/01_async_support.mdx b/docs/02_concepts/01_async_support.mdx index ddc06f9b..2d5f0dcf 100644 --- a/docs/02_concepts/01_async_support.mdx +++ b/docs/02_concepts/01_async_support.mdx @@ -8,11 +8,13 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import AsyncSupportExample from '!!raw-loader!./code/01_async_support.py'; -The package provides an asynchronous version of the client, [`ApifyClientAsync`](/reference/class/ApifyClientAsync), which allows you to interact with the Apify API using Python's standard async/await syntax. This enables you to perform non-blocking operations, see the Python [asyncio documentation](https://docs.python.org/3/library/asyncio-task.html) for more information. +The package provides an asynchronous version of the client, `ApifyClientAsync`, which allows you to interact with the Apify API using Python's standard async/await syntax. This enables you to perform non-blocking operations, see the Python [asyncio documentation](https://docs.python.org/3/library/asyncio-task.html) for more information. This is useful for applications that need to perform multiple API operations concurrently or integrate with other async frameworks. -The following example demonstrates how to run an Actor asynchronously and stream its logs while it is running: +The following example shows how to run an Actor asynchronously and stream its logs while it is running: {AsyncSupportExample} diff --git a/docs/02_concepts/02_single_collection_clients.mdx b/docs/02_concepts/02_single_collection_clients.mdx index fe1ddc91..31df8d24 100644 --- a/docs/02_concepts/02_single_collection_clients.mdx +++ b/docs/02_concepts/02_single_collection_clients.mdx @@ -8,15 +8,17 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import CollectionAsyncExample from '!!raw-loader!./code/02_collection_async.py'; import CollectionSyncExample from '!!raw-loader!./code/02_collection_sync.py'; import SingleAsyncExample from '!!raw-loader!./code/02_single_async.py'; import SingleSyncExample from '!!raw-loader!./code/02_single_sync.py'; -The Apify client interface is designed to be consistent and intuitive across all of its components. When you call specific methods on the main client, you create specialized clients to manage individual API resources. There are two main types of clients: +The Apify client provides two types of resource clients: single-resource clients for managing an individual resource, and collection clients for listing or creating resources. -- [`ActorClient`](/reference/class/ActorClient) - Manages a single resource. -- [`ActorCollectionClient`](/reference/class/ActorCollectionClient) - Manages a collection of resources. +- `ActorClient` - Manages a single resource. +- `ActorCollectionClient` - Manages a collection of resources. diff --git a/docs/02_concepts/03_nested_clients.mdx b/docs/02_concepts/03_nested_clients.mdx index 6df8a362..432dd52f 100644 --- a/docs/02_concepts/03_nested_clients.mdx +++ b/docs/02_concepts/03_nested_clients.mdx @@ -8,10 +8,12 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import NestedAsyncExample from '!!raw-loader!./code/03_nested_async.py'; import NestedSyncExample from '!!raw-loader!./code/03_nested_sync.py'; -In some cases, the Apify client provides nested clients to simplify working with related collections. For example, you can easily manage the runs of a specific Actor without having to construct multiple endpoints or client instances manually. +Nested clients let you access related resources directly from a parent resource client, without manually constructing new client instances. @@ -26,4 +28,4 @@ In some cases, the Apify client provides nested clients to simplify working with -This direct access to [Dataset](https://docs.apify.com/platform/storage/dataset) (and other storage resources) from the [`RunClient`](/reference/class/RunClient) is especially convenient when used alongside the [`ActorClient.last_run`](/reference/class/ActorClient#last_run) method. +This direct access to [Dataset](https://docs.apify.com/platform/storage/dataset) (and other storage resources) from the `RunClient` is especially convenient when used alongside the `ActorClient.last_run` method. diff --git a/docs/02_concepts/04_error_handling.mdx b/docs/02_concepts/04_error_handling.mdx index 16a96fae..265551ad 100644 --- a/docs/02_concepts/04_error_handling.mdx +++ b/docs/02_concepts/04_error_handling.mdx @@ -8,10 +8,12 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import ErrorAsyncExample from '!!raw-loader!./code/04_error_async.py'; import ErrorSyncExample from '!!raw-loader!./code/04_error_sync.py'; -When you use the Apify client, it automatically extracts all relevant data from the endpoint and returns it in the expected format. Date strings, for instance, are seamlessly converted to Python `datetime.datetime` objects. If an error occurs, the client raises an [`ApifyApiError`](/reference/class/ApifyApiError). This exception wraps the raw JSON errors returned by the API and provides additional context, making it easier to debug any issues that arise. +When you use the Apify client, it automatically extracts all relevant data from the endpoint and returns it in the expected format. Date strings, for instance, are seamlessly converted to Python `datetime.datetime` objects. If an error occurs, the client raises an `ApifyApiError`. This exception wraps the raw JSON errors returned by the API and provides additional context, making it easier to debug any issues that arise. diff --git a/docs/02_concepts/05_retries.mdx b/docs/02_concepts/05_retries.mdx index 50ee6779..bc7d2245 100644 --- a/docs/02_concepts/05_retries.mdx +++ b/docs/02_concepts/05_retries.mdx @@ -8,6 +8,8 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import RetriesAsyncExample from '!!raw-loader!./code/05_retries_async.py'; import RetriesSyncExample from '!!raw-loader!./code/05_retries_sync.py'; @@ -22,7 +24,7 @@ By default, the client retries a failed request up to 4 times. The retry interva - The first retry occurs after approximately 500 milliseconds. - The second retry occurs after approximately 1,000 milliseconds, and so on. -You can customize this behavior using the following options in the [`ApifyClient`](/reference/class/ApifyClient) constructor: +You can customize this behavior using the following options in the `ApifyClient` constructor: - `max_retries`: Defines the maximum number of retry attempts. - `min_delay_between_retries`: Sets the minimum delay between retries as a `timedelta`. diff --git a/docs/02_concepts/07_convenience_methods.mdx b/docs/02_concepts/07_convenience_methods.mdx index 988af550..323a447c 100644 --- a/docs/02_concepts/07_convenience_methods.mdx +++ b/docs/02_concepts/07_convenience_methods.mdx @@ -8,14 +8,16 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import CallAsyncExample from '!!raw-loader!./code/07_call_async.py'; import CallSyncExample from '!!raw-loader!./code/07_call_sync.py'; The Apify client provides several convenience methods to handle actions that the API alone cannot perform efficiently, such as waiting for an Actor run to finish without running into network timeouts. These methods simplify common tasks and enhance the usability of the client. -- [`ActorClient.call`](/reference/class/ActorClient#call) - Starts an Actor and waits for it to finish, handling network timeouts internally. Waits indefinitely by default, or up to the specified `wait_duration`. -- [`ActorClient.start`](/reference/class/ActorClient#start) - Starts an Actor and immediately returns the Run object without waiting for it to finish. -- [`RunClient.wait_for_finish`](/reference/class/RunClient#wait_for_finish) - Waits for an already-started run to reach a terminal status. +- `ActorClient.call` - Starts an Actor and waits for it to finish, handling network timeouts internally. Waits indefinitely by default, or up to the specified `wait_duration`. +- `ActorClient.start` - Starts an Actor and immediately returns the Run object without waiting for it to finish. +- `RunClient.wait_for_finish` - Waits for an already-started run to reach a terminal status. Additionally, storage-related resources offer flexible options for data retrieval: @@ -34,3 +36,7 @@ Additionally, storage-related resources offer flexible options for data retrieva + +:::tip +The `call()` method polls internally and may take a long time for long-running Actors. Use `start()` and `wait_for_finish()` separately if you need more control over the waiting behavior. +::: diff --git a/docs/02_concepts/08_pagination.mdx b/docs/02_concepts/08_pagination.mdx index c79638f2..c0ee4685 100644 --- a/docs/02_concepts/08_pagination.mdx +++ b/docs/02_concepts/08_pagination.mdx @@ -8,10 +8,12 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import PaginationAsyncExample from '!!raw-loader!./code/08_pagination_async.py'; import PaginationSyncExample from '!!raw-loader!./code/08_pagination_sync.py'; -Most methods named `list` or `list_something` in the Apify client return a [`ListPage`](/reference/class/ListPage) object. This object provides a consistent interface for working with paginated data and includes the following properties: +Most methods named `list` or `list_something` in the Apify client return a `ListPage` object. This object provides a consistent interface for working with paginated data and includes the following properties: - `items` - The main results you're looking for. - `total` - The total number of items available. @@ -21,7 +23,7 @@ Most methods named `list` or `list_something` in the Apify client return a [`Lis Some methods, such as `list_keys` or `list_head`, paginate differently. Regardless, the primary results are always stored under the items property, and the limit property can be used to control the number of results returned. -The following example demonstrates how to fetch all items from a dataset using pagination: +The following example shows how to fetch all items from a dataset using pagination: @@ -36,4 +38,8 @@ The following example demonstrates how to fetch all items from a dataset using p -The [`ListPage`](/reference/class/ListPage) interface offers several key benefits. Its consistent structure ensures predictable results for most `list` methods, providing a uniform way to work with paginated data. It also offers flexibility, allowing you to customize the `limit` and `offset` parameters to control data fetching according to your needs. Additionally, it provides scalability, enabling you to efficiently handle large datasets through pagination. This approach ensures efficient data retrieval while keeping memory usage under control, making it ideal for managing and processing large collections. +This approach lets you efficiently retrieve large datasets through pagination while keeping memory usage under control. + +:::tip +For most use cases, prefer `iterate_items()` which handles pagination automatically and yields items one by one. +::: diff --git a/docs/02_concepts/09_streaming.mdx b/docs/02_concepts/09_streaming.mdx index 0d5d6d1f..c1435f50 100644 --- a/docs/02_concepts/09_streaming.mdx +++ b/docs/02_concepts/09_streaming.mdx @@ -8,6 +8,8 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; +import ApiLink from '@site/src/components/ApiLink'; + import StreamingAsyncExample from '!!raw-loader!./code/09_streaming_async.py'; import StreamingSyncExample from '!!raw-loader!./code/09_streaming_sync.py'; @@ -15,13 +17,13 @@ Certain resources, such as dataset items, key-value store records, and logs, sup Supported streaming methods: -- [`DatasetClient.stream_items`](/reference/class/DatasetClient#stream_items) - Stream dataset items incrementally. -- [`KeyValueStoreClient.stream_record`](/reference/class/KeyValueStoreClient#stream_record) - Stream key-value store records as raw data. -- [`LogClient.stream`](/reference/class/LogClient#stream) - Stream logs in real time. +- `DatasetClient.stream_items` - Stream dataset items incrementally. +- `KeyValueStoreClient.stream_record` - Stream key-value store records as raw data. +- `LogClient.stream` - Stream logs in real time. These methods return a raw, context-managed `impit.Response` object. The response must be consumed within a with block to ensure that the connection is closed automatically, preventing memory leaks or unclosed connections. -The following example demonstrates how to stream the logs of an Actor run incrementally: +The following example shows how to stream the logs of an Actor run incrementally: @@ -36,4 +38,4 @@ The following example demonstrates how to stream the logs of an Actor run increm -Streaming offers several key benefits. It ensures memory efficiency by loading only a small portion of the resource into memory at any given time, making it ideal for handling large data. It enables real-time processing, allowing you to start working with data immediately as it is received. With automatic resource management, using the `with` statement ensures that connections are properly closed, preventing memory leaks or unclosed connections. This approach is valuable for processing large logs, datasets, or files on the fly without the need to download them entirely. +Streaming is ideal for processing large logs, datasets, or files incrementally without downloading them entirely into memory. diff --git a/docs/03_guides/01_passing_input_to_actor.mdx b/docs/03_guides/01_passing_input_to_actor.mdx index 25af270e..5ac6b91c 100644 --- a/docs/03_guides/01_passing_input_to_actor.mdx +++ b/docs/03_guides/01_passing_input_to_actor.mdx @@ -11,9 +11,9 @@ import CodeBlock from '@theme/CodeBlock'; import StreamingAsyncExample from '!!raw-loader!./code/01_input_async.py'; import StreamingSyncExample from '!!raw-loader!./code/01_input_sync.py'; -The efficient way to run an Actor and retrieve results is by passing input data directly to the `call` method. This method allows you to configure the Actor's input, execute it, and either get a reference to the running Actor or wait for its completion. +You can pass input data directly to the `call` method to configure and run an Actor in a single step. -The following example demonstrates how to pass input to the `apify/instagram-hashtag-scraper` Actor and wait for it to finish. +The following example shows how to pass input to the `apify/instagram-hashtag-scraper` Actor and wait for it to finish. diff --git a/docs/03_guides/02_manage_tasks_for_reusable_input.mdx b/docs/03_guides/02_manage_tasks_for_reusable_input.mdx index 6a947e91..205c77c5 100644 --- a/docs/03_guides/02_manage_tasks_for_reusable_input.mdx +++ b/docs/03_guides/02_manage_tasks_for_reusable_input.mdx @@ -13,7 +13,7 @@ import TasksSyncExample from '!!raw-loader!./code/02_tasks_sync.py'; When you need to run multiple inputs with the same Actor, the most convenient approach is to create multiple [tasks](https://docs.apify.com/platform/actors/running/tasks), each with different input configurations. Task inputs are stored on the Apify platform when the task is created, allowing you to reuse them easily. -The following example demonstrates how to create tasks for the `apify/instagram-hashtag-scraper` Actor with different inputs, manage task clients, and execute them asynchronously: +The following example shows how to create tasks for the `apify/instagram-hashtag-scraper` Actor with different inputs, manage task clients, and execute them asynchronously: diff --git a/docs/03_guides/03_retrieve_actor_data.mdx b/docs/03_guides/03_retrieve_actor_data.mdx index ecb266f5..7490f114 100644 --- a/docs/03_guides/03_retrieve_actor_data.mdx +++ b/docs/03_guides/03_retrieve_actor_data.mdx @@ -13,7 +13,7 @@ import RetrieveSyncExample from '!!raw-loader!./code/03_retrieve_sync.py'; Actor output data is stored in [datasets](https://docs.apify.com/platform/storage/dataset), which can be retrieved from individual Actor runs. Dataset items support pagination for efficient retrieval, and multiple datasets can be merged into a single dataset for further analysis. This merged dataset can then be exported into various formats such as CSV, JSON, XLSX, or XML. Additionally, [integrations](https://docs.apify.com/platform/integrations) provide powerful tools to automate data workflows. -The following example demonstrates how to fetch datasets from an Actor's runs, paginate through their items, and merge them into a single dataset for unified analysis: +The following example shows how to fetch datasets from an Actor's runs, paginate through their items, and merge them into a single dataset for unified analysis: diff --git a/docs/03_guides/04_integration_with_data_libraries.mdx b/docs/03_guides/04_integration_with_data_libraries.mdx index 91aefcd9..ab5dc4f7 100644 --- a/docs/03_guides/04_integration_with_data_libraries.mdx +++ b/docs/03_guides/04_integration_with_data_libraries.mdx @@ -13,7 +13,7 @@ import PandasSyncExample from '!!raw-loader!./code/04_pandas_sync.py'; The Apify client for Python seamlessly integrates with data analysis libraries like [Pandas](https://pandas.pydata.org/). This allows you to load dataset items directly into a Pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) for efficient manipulation and analysis. Pandas provides robust data structures and tools for handling large datasets, making it a powerful addition to your Apify workflows. -The following example demonstrates how to retrieve items from the most recent dataset of an Actor run and load them into a Pandas DataFrame for further analysis: +The following example shows how to retrieve items from the most recent dataset of an Actor run and load them into a Pandas DataFrame for further analysis: diff --git a/docs/04_upgrading/upgrading_to_v2.md b/docs/04_upgrading/upgrading_to_v2.mdx similarity index 90% rename from docs/04_upgrading/upgrading_to_v2.md rename to docs/04_upgrading/upgrading_to_v2.mdx index be1ffb8a..714b6753 100644 --- a/docs/04_upgrading/upgrading_to_v2.md +++ b/docs/04_upgrading/upgrading_to_v2.mdx @@ -4,6 +4,8 @@ title: Upgrading to v2 description: Breaking changes and migration guide from v1 to v2. --- +import ApiLink from '@site/src/components/ApiLink'; + This page summarizes the breaking changes between Apify Python API Client v1.x and v2.0. ## Python version support @@ -41,4 +43,4 @@ Some modules have been restructured. ### Errors -- Error classes are now accessible from the public `apify_client.errors` module. See the [API documentation](https://docs.apify.com/api/client/python/reference/class/ApifyApiError) for a complete list of available error classes. +- Error classes are now accessible from the public `apify_client.errors` module. See the `ApifyApiError` API reference for a complete list of available error classes.