Skip to content

[Bug] [python]: Possible race condition when starting many pipelines at the same time #36847

@hjtran

Description

@hjtran

What happened?

If you start many pipelines at the same time, it's possible to run into a FileExistsError race condition when two pipelines try to create the requirements cache at the same time:

File ".../lib/python3.11/site-packages/apache_beam/runners/runner.py", line 182, in run_pipeline
    default_environment=self.default_environment(options)),
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/apache_beam/runners/runner.py", line 163, in default_environment
    return environments.Environment.from_options(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/apache_beam/transforms/environments.py", line 274, in from_options
    return env_class.from_options(portable_options)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/apache_beam/transforms/environments.py", line 367, in from_options
    artifacts=python_sdk_dependencies(options),
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/apache_beam/transforms/environments.py", line 913, in python_sdk_dependencies
    return stager.Stager.create_job_resources(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/apache_beam/runners/portability/stager.py", line 221, in create_job_resources
    os.makedirs(requirements_cache_path)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/tmp/dataflow-requirements-cache

I think we can probably just set exists_ok=True for the. makedirs call

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions