ApifyStorageClient
Index
Methods
__init__
Initialize a new instance.
Parameters
optionalkeyword-onlyrequest_queue_access: Literal[single, shared] = 'single'
Defines how the request queue client behaves. Use
single
mode for a single consumer. It has fewer API calls, meaning better performance and lower costs. If you need multiple concurrent consumers useshared
mode, but expect worse performance and higher costs due to the additional overhead.
Returns None
create_dataset_client
Parameters
optionalkeyword-onlyid: str | None = None
optionalkeyword-onlyname: str | None = None
optionalkeyword-onlyalias: str | None = None
optionalkeyword-onlyconfiguration: CrawleeConfiguration | None = None
Returns ApifyDatasetClient
create_kvs_client
Parameters
optionalkeyword-onlyid: str | None = None
optionalkeyword-onlyname: str | None = None
optionalkeyword-onlyalias: str | None = None
optionalkeyword-onlyconfiguration: CrawleeConfiguration | None = None
Returns ApifyKeyValueStoreClient
create_rq_client
Parameters
optionalkeyword-onlyid: str | None = None
optionalkeyword-onlyname: str | None = None
optionalkeyword-onlyalias: str | None = None
optionalkeyword-onlyconfiguration: CrawleeConfiguration | None = None
Returns ApifyRequestQueueClient
get_storage_client_cache_key
Parameters
configuration: CrawleeConfiguration
Returns Hashable
Apify platform implementation of the storage client.
This storage client provides access to datasets, key-value stores, and request queues that persist data to the Apify platform. Each storage type is implemented with its own specific Apify client that stores data in the cloud, making it accessible from anywhere.
The communication with the Apify platform is handled via the Apify API client for Python, which is an HTTP API wrapper. For maximum efficiency and performance of the storage clients, various caching mechanisms are used to minimize the number of API calls made to the Apify platform. Data can be inspected and manipulated through the Apify console web interface or via the Apify API.
The request queue client supports two access modes controlled by the
request_queue_access
parameter:Single mode
The
single
mode is optimized for scenarios with only one consumer. It minimizes API calls, making it faster and more cost-efficient compared to theshared
mode. This option is ideal when a single Actor is responsible for consuming the entire request queue. Using multiple consumers simultaneously may lead to inconsistencies or unexpected behavior.In this mode, multiple producers can safely add new requests, but forefront requests may not be processed immediately, as the client relies on local head estimation instead of frequent forefront fetching. Requests can also be added or marked as handled by other clients, but they must not be deleted or modified, since such changes would not be reflected in the local cache. If a request is already fully cached locally, marking it as handled by another client will be ignored by this client. This does not cause errors but can occasionally result in reprocessing a request that was already handled elsewhere. If the request was not yet cached locally, marking it as handled poses no issue.
Shared mode
The
shared
mode is designed for scenarios with multiple concurrent consumers. It ensures proper synchronization and consistency across clients, at the cost of higher API usage and slightly worse performance. This mode is safe for concurrent access from multiple processes, including Actors running in parallel on the Apify platform. It should be used when multiple consumers need to process requests from the same queue simultaneously.