-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What happened?
I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster.
I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using ReadFromKafka from apache_beam.io.kafka.
My Flink Cluster is managed by the Apache Flink Kubernetes Operator.
My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image apache/beam_flink1.16_job_server:2.54.0.
My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image apache/beam_python3.11_sdk:2.54.0 and the arg --worker_pool.
When I start my beam job, I get the following error on the job manager logs:
Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory
These are my Beam pipeline options:
--job_name=beam_example_pipeline
--runner=PortableRunner
--job_endpoint=beam-flink-job-server:8099
--artifact_endpoint=beam-flink-job-server:8098
--environment_type=EXTERNAL
--environment_config=localhost:50000
--parallelism=1
--streaming
Some resources I've found suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) --environment_type=DOCKER, which is what causes the issues. However, I could be wrong, so please say so if I am.
All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL. How can I resolve this issue? Is this a bug with Beam?
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner