Skip to content

[Bug] Beam errors out when using PortableRunner (Flink Runner) – Cannot run program "docker" #30663

@harrymyburgh

Description

@harrymyburgh

What happened?

I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster.
I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using ReadFromKafka from apache_beam.io.kafka.

My Flink Cluster is managed by the Apache Flink Kubernetes Operator.

My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image apache/beam_flink1.16_job_server:2.54.0.

My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image apache/beam_python3.11_sdk:2.54.0 and the arg --worker_pool.

When I start my beam job, I get the following error on the job manager logs:

Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory

These are my Beam pipeline options:

--job_name=beam_example_pipeline
--runner=PortableRunner
--job_endpoint=beam-flink-job-server:8099
--artifact_endpoint=beam-flink-job-server:8098
--environment_type=EXTERNAL
--environment_config=localhost:50000
--parallelism=1
--streaming

Some resources I've found suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) --environment_type=DOCKER, which is what causes the issues. However, I could be wrong, so please say so if I am.

All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL. How can I resolve this issue? Is this a bug with Beam?

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions