-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Open
Description
What happened?
If you start many pipelines at the same time, it's possible to run into a FileExistsError race condition when two pipelines try to create the requirements cache at the same time:
File ".../lib/python3.11/site-packages/apache_beam/runners/runner.py", line 182, in run_pipeline
default_environment=self.default_environment(options)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/apache_beam/runners/runner.py", line 163, in default_environment
return environments.Environment.from_options(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/apache_beam/transforms/environments.py", line 274, in from_options
return env_class.from_options(portable_options) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/apache_beam/transforms/environments.py", line 367, in from_options
artifacts=python_sdk_dependencies(options),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/apache_beam/transforms/environments.py", line 913, in python_sdk_dependencies
return stager.Stager.create_job_resources(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../lib/python3.11/site-packages/apache_beam/runners/portability/stager.py", line 221, in create_job_resources
os.makedirs(requirements_cache_path)
File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: '/tmp/dataflow-requirements-cache
I think we can probably just set exists_ok=True for the. makedirs call
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner