Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/01_introduction/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ pip install apify-client

## Quick example

Here's an example showing how to run an Actor and retrieve its results:
The following example shows how to run an Actor and retrieve its results:

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
8 changes: 5 additions & 3 deletions docs/01_introduction/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import AuthAsyncExample from '!!raw-loader!./code/02_auth_async.py';
import AuthSyncExample from '!!raw-loader!./code/02_auth_sync.py';
import InputAsyncExample from '!!raw-loader!./code/03_input_async.py';
Expand Down Expand Up @@ -44,9 +46,9 @@ The API token is used to authorize your requests to the Apify API. You can be ch

## Step 2: Run an Actor

To start an Actor, call the [`apify_client.actor()`](/reference/class/ActorClient) method with the Actor's ID (e.g., `john-doe/my-cool-actor`). The Actor's ID is a combination of the Actor owner's username and the Actor name. You can run both your own Actors and Actors from [Apify Store](https://apify.com/store).
To start an Actor, call the <ApiLink to="class/ActorClient">`apify_client.actor()`</ApiLink> method with the Actor's ID (e.g., `john-doe/my-cool-actor`). The Actor's ID is a combination of the Actor owner's username and the Actor name. You can run both your own Actors and Actors from [Apify Store](https://apify.com/store).

To define the Actor's input, pass a dictionary to the [`call()`](/reference/class/ActorClient#call) method that matches the Actor's [input schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema). The input can include URLs to scrape, search terms, or other configuration data.
To define the Actor's input, pass a dictionary to the <ApiLink to="class/ActorClient#call">`call()`</ApiLink> method that matches the Actor's [input schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema). The input can include URLs to scrape, search terms, or other configuration data.

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand All @@ -63,7 +65,7 @@ To define the Actor's input, pass a dictionary to the [`call()`](/reference/clas

## Step 3: Get results from the dataset

To get the results from the dataset, call the [`apify_client.dataset()`](/reference/class/DatasetClient) method with the dataset ID, then call [`list_items()`](/reference/class/DatasetClient#list_items) to retrieve the data. You can get the dataset ID from the Actor's run dictionary (represented by `defaultDatasetId`).
To get the results from the dataset, call the <ApiLink to="class/DatasetClient">`apify_client.dataset()`</ApiLink> method with the dataset ID, then call <ApiLink to="class/DatasetClient#list_items">`list_items()`</ApiLink> to retrieve the data. You can get the dataset ID from the Actor's run dictionary (represented by `defaultDatasetId`).

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
6 changes: 4 additions & 2 deletions docs/02_concepts/01_async_support.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import AsyncSupportExample from '!!raw-loader!./code/01_async_support.py';

The package provides an asynchronous version of the client, [`ApifyClientAsync`](/reference/class/ApifyClientAsync), which allows you to interact with the Apify API using Python's standard async/await syntax. This enables you to perform non-blocking operations, see the Python [asyncio documentation](https://docs.python.org/3/library/asyncio-task.html) for more information.
The package provides an asynchronous version of the client, <ApiLink to="class/ApifyClientAsync">`ApifyClientAsync`</ApiLink>, which allows you to interact with the Apify API using Python's standard async/await syntax. This enables you to perform non-blocking operations, see the Python [asyncio documentation](https://docs.python.org/3/library/asyncio-task.html) for more information. This is useful for applications that need to perform multiple API operations concurrently or integrate with other async frameworks.

The following example demonstrates how to run an Actor asynchronously and stream its logs while it is running:
The following example shows how to run an Actor asynchronously and stream its logs while it is running:

<CodeBlock className="language-python">
{AsyncSupportExample}
Expand Down
8 changes: 5 additions & 3 deletions docs/02_concepts/02_single_collection_clients.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,17 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import CollectionAsyncExample from '!!raw-loader!./code/02_collection_async.py';
import CollectionSyncExample from '!!raw-loader!./code/02_collection_sync.py';
import SingleAsyncExample from '!!raw-loader!./code/02_single_async.py';
import SingleSyncExample from '!!raw-loader!./code/02_single_sync.py';

The Apify client interface is designed to be consistent and intuitive across all of its components. When you call specific methods on the main client, you create specialized clients to manage individual API resources. There are two main types of clients:
The Apify client provides two types of resource clients: single-resource clients for managing an individual resource, and collection clients for listing or creating resources.

- [`ActorClient`](/reference/class/ActorClient) - Manages a single resource.
- [`ActorCollectionClient`](/reference/class/ActorCollectionClient) - Manages a collection of resources.
- <ApiLink to="class/ActorClient">`ActorClient`</ApiLink> - Manages a single resource.
- <ApiLink to="class/ActorCollectionClient">`ActorCollectionClient`</ApiLink> - Manages a collection of resources.

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
6 changes: 4 additions & 2 deletions docs/02_concepts/03_nested_clients.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import NestedAsyncExample from '!!raw-loader!./code/03_nested_async.py';
import NestedSyncExample from '!!raw-loader!./code/03_nested_sync.py';

In some cases, the Apify client provides nested clients to simplify working with related collections. For example, you can easily manage the runs of a specific Actor without having to construct multiple endpoints or client instances manually.
Nested clients let you access related resources directly from a parent resource client, without manually constructing new client instances.

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand All @@ -26,4 +28,4 @@ In some cases, the Apify client provides nested clients to simplify working with
</TabItem>
</Tabs>

This direct access to [Dataset](https://docs.apify.com/platform/storage/dataset) (and other storage resources) from the [`RunClient`](/reference/class/RunClient) is especially convenient when used alongside the [`ActorClient.last_run`](/reference/class/ActorClient#last_run) method.
This direct access to [Dataset](https://docs.apify.com/platform/storage/dataset) (and other storage resources) from the <ApiLink to="class/RunClient">`RunClient`</ApiLink> is especially convenient when used alongside the <ApiLink to="class/ActorClient#last_run">`ActorClient.last_run`</ApiLink> method.
4 changes: 3 additions & 1 deletion docs/02_concepts/04_error_handling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import ErrorAsyncExample from '!!raw-loader!./code/04_error_async.py';
import ErrorSyncExample from '!!raw-loader!./code/04_error_sync.py';

When you use the Apify client, it automatically extracts all relevant data from the endpoint and returns it in the expected format. Date strings, for instance, are seamlessly converted to Python `datetime.datetime` objects. If an error occurs, the client raises an [`ApifyApiError`](/reference/class/ApifyApiError). This exception wraps the raw JSON errors returned by the API and provides additional context, making it easier to debug any issues that arise.
When you use the Apify client, it automatically extracts all relevant data from the endpoint and returns it in the expected format. Date strings, for instance, are seamlessly converted to Python `datetime.datetime` objects. If an error occurs, the client raises an <ApiLink to="class/ApifyApiError">`ApifyApiError`</ApiLink>. This exception wraps the raw JSON errors returned by the API and provides additional context, making it easier to debug any issues that arise.

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
4 changes: 3 additions & 1 deletion docs/02_concepts/05_retries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import RetriesAsyncExample from '!!raw-loader!./code/05_retries_async.py';
import RetriesSyncExample from '!!raw-loader!./code/05_retries_sync.py';

Expand All @@ -22,7 +24,7 @@ By default, the client retries a failed request up to 4 times. The retry interva
- The first retry occurs after approximately 500 milliseconds.
- The second retry occurs after approximately 1,000 milliseconds, and so on.

You can customize this behavior using the following options in the [`ApifyClient`](/reference/class/ApifyClient) constructor:
You can customize this behavior using the following options in the <ApiLink to="class/ApifyClient">`ApifyClient`</ApiLink> constructor:

- `max_retries`: Defines the maximum number of retry attempts.
- `min_delay_between_retries`: Sets the minimum delay between retries as a `timedelta`.
Expand Down
12 changes: 9 additions & 3 deletions docs/02_concepts/07_convenience_methods.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,16 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import CallAsyncExample from '!!raw-loader!./code/07_call_async.py';
import CallSyncExample from '!!raw-loader!./code/07_call_sync.py';

The Apify client provides several convenience methods to handle actions that the API alone cannot perform efficiently, such as waiting for an Actor run to finish without running into network timeouts. These methods simplify common tasks and enhance the usability of the client.

- [`ActorClient.call`](/reference/class/ActorClient#call) - Starts an Actor and waits for it to finish, handling network timeouts internally. Waits indefinitely by default, or up to the specified `wait_duration`.
- [`ActorClient.start`](/reference/class/ActorClient#start) - Starts an Actor and immediately returns the Run object without waiting for it to finish.
- [`RunClient.wait_for_finish`](/reference/class/RunClient#wait_for_finish) - Waits for an already-started run to reach a terminal status.
- <ApiLink to="class/ActorClient#call">`ActorClient.call`</ApiLink> - Starts an Actor and waits for it to finish, handling network timeouts internally. Waits indefinitely by default, or up to the specified `wait_duration`.
- <ApiLink to="class/ActorClient#start">`ActorClient.start`</ApiLink> - Starts an Actor and immediately returns the Run object without waiting for it to finish.
- <ApiLink to="class/RunClient#wait_for_finish">`RunClient.wait_for_finish`</ApiLink> - Waits for an already-started run to reach a terminal status.

Additionally, storage-related resources offer flexible options for data retrieval:

Expand All @@ -34,3 +36,7 @@ Additionally, storage-related resources offer flexible options for data retrieva
</CodeBlock>
</TabItem>
</Tabs>

:::tip
The `call()` method polls internally and may take a long time for long-running Actors. Use `start()` and `wait_for_finish()` separately if you need more control over the waiting behavior.
:::
12 changes: 9 additions & 3 deletions docs/02_concepts/08_pagination.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import PaginationAsyncExample from '!!raw-loader!./code/08_pagination_async.py';
import PaginationSyncExample from '!!raw-loader!./code/08_pagination_sync.py';

Most methods named `list` or `list_something` in the Apify client return a [`ListPage`](/reference/class/ListPage) object. This object provides a consistent interface for working with paginated data and includes the following properties:
Most methods named `list` or `list_something` in the Apify client return a <ApiLink to="class/ListPage">`ListPage`</ApiLink> object. This object provides a consistent interface for working with paginated data and includes the following properties:

- `items` - The main results you're looking for.
- `total` - The total number of items available.
Expand All @@ -21,7 +23,7 @@ Most methods named `list` or `list_something` in the Apify client return a [`Lis

Some methods, such as `list_keys` or `list_head`, paginate differently. Regardless, the primary results are always stored under the items property, and the limit property can be used to control the number of results returned.

The following example demonstrates how to fetch all items from a dataset using pagination:
The following example shows how to fetch all items from a dataset using pagination:

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand All @@ -36,4 +38,8 @@ The following example demonstrates how to fetch all items from a dataset using p
</TabItem>
</Tabs>

The [`ListPage`](/reference/class/ListPage) interface offers several key benefits. Its consistent structure ensures predictable results for most `list` methods, providing a uniform way to work with paginated data. It also offers flexibility, allowing you to customize the `limit` and `offset` parameters to control data fetching according to your needs. Additionally, it provides scalability, enabling you to efficiently handle large datasets through pagination. This approach ensures efficient data retrieval while keeping memory usage under control, making it ideal for managing and processing large collections.
This approach lets you efficiently retrieve large datasets through pagination while keeping memory usage under control.

:::tip
For most use cases, prefer `iterate_items()` which handles pagination automatically and yields items one by one.
:::
12 changes: 7 additions & 5 deletions docs/02_concepts/09_streaming.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,22 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

import ApiLink from '@site/src/components/ApiLink';

import StreamingAsyncExample from '!!raw-loader!./code/09_streaming_async.py';
import StreamingSyncExample from '!!raw-loader!./code/09_streaming_sync.py';

Certain resources, such as dataset items, key-value store records, and logs, support streaming directly from the Apify API. This allows you to process large resources incrementally without downloading them entirely into memory, making it ideal for handling large or continuously updated data.

Supported streaming methods:

- [`DatasetClient.stream_items`](/reference/class/DatasetClient#stream_items) - Stream dataset items incrementally.
- [`KeyValueStoreClient.stream_record`](/reference/class/KeyValueStoreClient#stream_record) - Stream key-value store records as raw data.
- [`LogClient.stream`](/reference/class/LogClient#stream) - Stream logs in real time.
- <ApiLink to="class/DatasetClient#stream_items">`DatasetClient.stream_items`</ApiLink> - Stream dataset items incrementally.
- <ApiLink to="class/KeyValueStoreClient#stream_record">`KeyValueStoreClient.stream_record`</ApiLink> - Stream key-value store records as raw data.
- <ApiLink to="class/LogClient#stream">`LogClient.stream`</ApiLink> - Stream logs in real time.

These methods return a raw, context-managed `impit.Response` object. The response must be consumed within a with block to ensure that the connection is closed automatically, preventing memory leaks or unclosed connections.

The following example demonstrates how to stream the logs of an Actor run incrementally:
The following example shows how to stream the logs of an Actor run incrementally:

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand All @@ -36,4 +38,4 @@ The following example demonstrates how to stream the logs of an Actor run increm
</TabItem>
</Tabs>

Streaming offers several key benefits. It ensures memory efficiency by loading only a small portion of the resource into memory at any given time, making it ideal for handling large data. It enables real-time processing, allowing you to start working with data immediately as it is received. With automatic resource management, using the `with` statement ensures that connections are properly closed, preventing memory leaks or unclosed connections. This approach is valuable for processing large logs, datasets, or files on the fly without the need to download them entirely.
Streaming is ideal for processing large logs, datasets, or files incrementally without downloading them entirely into memory.
4 changes: 2 additions & 2 deletions docs/03_guides/01_passing_input_to_actor.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ import CodeBlock from '@theme/CodeBlock';
import StreamingAsyncExample from '!!raw-loader!./code/01_input_async.py';
import StreamingSyncExample from '!!raw-loader!./code/01_input_sync.py';

The efficient way to run an Actor and retrieve results is by passing input data directly to the `call` method. This method allows you to configure the Actor's input, execute it, and either get a reference to the running Actor or wait for its completion.
You can pass input data directly to the `call` method to configure and run an Actor in a single step.

The following example demonstrates how to pass input to the `apify/instagram-hashtag-scraper` Actor and wait for it to finish.
The following example shows how to pass input to the `apify/instagram-hashtag-scraper` Actor and wait for it to finish.

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
2 changes: 1 addition & 1 deletion docs/03_guides/02_manage_tasks_for_reusable_input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import TasksSyncExample from '!!raw-loader!./code/02_tasks_sync.py';

When you need to run multiple inputs with the same Actor, the most convenient approach is to create multiple [tasks](https://docs.apify.com/platform/actors/running/tasks), each with different input configurations. Task inputs are stored on the Apify platform when the task is created, allowing you to reuse them easily.

The following example demonstrates how to create tasks for the `apify/instagram-hashtag-scraper` Actor with different inputs, manage task clients, and execute them asynchronously:
The following example shows how to create tasks for the `apify/instagram-hashtag-scraper` Actor with different inputs, manage task clients, and execute them asynchronously:

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
2 changes: 1 addition & 1 deletion docs/03_guides/03_retrieve_actor_data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import RetrieveSyncExample from '!!raw-loader!./code/03_retrieve_sync.py';

Actor output data is stored in [datasets](https://docs.apify.com/platform/storage/dataset), which can be retrieved from individual Actor runs. Dataset items support pagination for efficient retrieval, and multiple datasets can be merged into a single dataset for further analysis. This merged dataset can then be exported into various formats such as CSV, JSON, XLSX, or XML. Additionally, [integrations](https://docs.apify.com/platform/integrations) provide powerful tools to automate data workflows.

The following example demonstrates how to fetch datasets from an Actor's runs, paginate through their items, and merge them into a single dataset for unified analysis:
The following example shows how to fetch datasets from an Actor's runs, paginate through their items, and merge them into a single dataset for unified analysis:

<Tabs>
<TabItem value="AsyncExample" label="Async client" default>
Expand Down
Loading