fix: Do not share state between different crawlers unless requested #1669

Pijukatel · 2026-01-12T14:28:29Z

Description

Introduces a new argument id for BasicCrawler. This argument controls the shared state.
Each new instance of the BasicCrawler gets an automatically incremented id to avoid unintentional sharing of state between crawlers.
If two crawlers should use the same state, then it is possible to pass the same id to the crawler __init__.

Issues

Closes: #1627

Testing

Added tests.

codecov · 2026-01-12T14:46:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.40%. Comparing base (0a0995c) to head (01335f0).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1669      +/-   ##
==========================================
- Coverage   92.41%   92.40%   -0.02%     
==========================================
  Files         157      157              
  Lines       10478    10488      +10     
==========================================
+ Hits         9683     9691       +8     
- Misses        795      797       +2

Flag	Coverage Δ
unit	`92.40% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Pijukatel · 2026-01-12T14:49:44Z

Better discuss this. After implementing this draft, I am leaning towards alternative 1 (see description)

@janbuchar , @barjin, @B4nan. Could you please share your point of view?

You can see the usage in code in the updated and new test in this PR.

barjin · 2026-01-12T15:08:00Z

Explicitly passing state_kvs instead of crawler_id

The state is just one key in the KVS though, it feels weird to me to make the state this prominent in our API. If it's about the entire KVS (so e.g. get_key_value_store will also return this KVS), then it makes a bit more sense to me.

Maybe it's unclear that the "crawler state" is actually stored in KVS - this we should IMO communicate better in the docs.

Having thought about this a bit more, I see the "state sharing" as a bug again :) Different crawler instances IMO shouldn't influence each other just because they are touching the same storage implementation (if this is intentional, it should be explicit).

B4nan

I feel like I am getting lost in this, I thought the id is rather internal thing to ensure two crawler instances created in one app context won't share the state. We expose the id so people can opt-in to sharing the state explicitly, but the important bit to me is that those IDs will be unique automatically. I can't think of a use case where one would want to create multiple crawlers and share their stats. Similarly, I don't think sharing the state object is something people would want to, at least not by default.

src/crawlee/crawlers/_basic/_basic_crawler.py

Pijukatel · 2026-01-13T09:22:05Z

Explicitly passing state_kvs instead of crawler_id

The state is just one key in the KVS though, it feels weird to me to make the state this prominent in our API. If it's about the entire KVS (so e.g. get_key_value_store will also return this KVS), then it makes a bit more sense to me.

Maybe it's unclear that the "crawler state" is actually stored in KVS - this we should IMO communicate better in the docs.

Having thought about this a bit more, I see the "state sharing" as a bug again :) Different crawler instances IMO shouldn't influence each other just because they are touching the same storage implementation (if this is intentional, it should be explicit).

What about having an optional argument use_state? The default will be a function that saves to the default kvs under an automatically incremented id and the user can pass whatever custom implementation if they want something custom, like sharing the same state between two crawlers.

This will be an easy and clear default without the need for extra arguments and maximum flexibility for custom use cases.

janbuchar · 2026-01-13T13:54:33Z

What about having an optional argument use_state? The default will be a function that saves to the default kvs under an automatically incremented id and the user can pass whatever custom implementation if they want something custom, like sharing the same state between two crawlers.

I have a hard time imagining that, could you sketch out some code samples?

Only for discussion, types ignored for now.

Pijukatel · 2026-01-13T16:00:04Z

What about having an optional argument use_state? The default will be a function that saves to the default kvs under an automatically incremented id and the user can pass whatever custom implementation if they want something custom, like sharing the same state between two crawlers.

I have a hard time imagining that, could you sketch out some code samples?

Please check the latest commit. I added an example of how this could be done. (Please do not focus on that specific example; it is just to demonstrate the idea. The question is whether the use_state should be some hardcoded internal that can be parametrized, or if it should be a component of the crawler that can be fully replaced by a custom implementation. )

tests/unit/crawlers/_basic/test_basic_crawler.py

B4nan · 2026-01-15T09:47:34Z

Regarding this comment about Actor.useState vs Crawler.useState, I'd actually say they should point to the same state by default, let's call it id: 0. If you would create two crawlers in one context, the first would use the same state as the SDK.

Now the reasoning - we also have context.useState and I'd say it makes sense to allow people use Actor.useState and context.useState together.

If we don't do this, it feels like a nasty BC that might surprise many people. Imagine you init the state before you run the crawler via Actor.useState and then you use context.useState in the handler, but the initial state is not there, forcing you to define it in the handler again.

This reverts commit 1bbb651.

Expand existing test

88e0fb1

github-actions bot assigned Pijukatel Jan 12, 2026

github-actions bot added this to the 132nd sprint - Tooling team milestone Jan 12, 2026

github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Jan 12, 2026

Version 1: State depends on crawler_id, but stats does not.

31e16d2

Pijukatel force-pushed the add-crawler-id branch from 6ff0b3f to 31e16d2 Compare January 12, 2026 14:42

Pijukatel changed the title ~~Add crawler~~ fix: Do not share state between different crawlers unless requested Jan 12, 2026

B4nan reviewed Jan 12, 2026

View reviewed changes

src/crawlee/crawlers/_basic/_basic_crawler.py Outdated Show resolved Hide resolved

Draft of use_state as input argument

1bbb651

Only for discussion, types ignored for now.

Pijukatel mentioned this pull request Jan 13, 2026

docs: State persistence update apify/apify-docs#2176

Open

janbuchar reviewed Jan 13, 2026

View reviewed changes

tests/unit/crawlers/_basic/test_basic_crawler.py Outdated Show resolved Hide resolved

Revert "Draft of use_state as input argument"

415299f

This reverts commit 1bbb651.

Pijukatel force-pushed the add-crawler-id branch from 632c844 to a674038 Compare January 15, 2026 12:23

Pijukatel requested a review from janbuchar January 15, 2026 12:33

Pijukatel marked this pull request as ready for review January 15, 2026 12:33

Rename crawler_id to just id. Polish.

01335f0

Pijukatel force-pushed the add-crawler-id branch from a674038 to 01335f0 Compare January 15, 2026 12:34

Pijukatel mentioned this pull request Jan 15, 2026

Consider adding Actor.use_state apify/apify-sdk-python#735

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Do not share state between different crawlers unless requested #1669

fix: Do not share state between different crawlers unless requested #1669

Pijukatel commented Jan 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

Pijukatel commented Jan 12, 2026

Uh oh!

barjin commented Jan 12, 2026

Uh oh!

B4nan left a comment

Uh oh!

Uh oh!

Pijukatel commented Jan 13, 2026

Uh oh!

janbuchar commented Jan 13, 2026

Uh oh!

Pijukatel commented Jan 13, 2026

Uh oh!

Uh oh!

B4nan commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: Do not share state between different crawlers unless requested #1669

Are you sure you want to change the base?

fix: Do not share state between different crawlers unless requested #1669

Conversation

Pijukatel commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Testing

Uh oh!

codecov bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Pijukatel commented Jan 12, 2026

Uh oh!

barjin commented Jan 12, 2026

Uh oh!

B4nan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pijukatel commented Jan 13, 2026

Uh oh!

janbuchar commented Jan 13, 2026

Uh oh!

Pijukatel commented Jan 13, 2026

Uh oh!

Uh oh!

B4nan commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Pijukatel commented Jan 12, 2026 •

edited

Loading

codecov bot commented Jan 12, 2026 •

edited

Loading