-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What would you like to happen?
I would like Beam's Spark Runner to officially support Apache Spark 4.0. Spark 4.0 brings multiple improvements and new features, including performance enhancements, security updates, and updated APIs (e.g., improvements in shuffle, catalog features, Kubernetes operator, and structured streaming). As major platforms (such as Dataproc, EMR, and internal platforms) start adopting Spark 4.0, supporting this version will enable users to migrate without friction and leverage new Spark functionalities. Current Beam documentation lists support up to Spark 3.x, and 4.0 support is not mentioned. Attempting to run the Spark Runner on Spark 4.0 results in compatibility issues (such as dependency conflicts and API changes). Official CI matrix coverage, documentation, and dependency updates for 4.0 are needed for safe production use. This would help unblock organizations intending to move to Spark 4.0 and keep Beam competitive as Spark evolves.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner