Commit f7688c4

authored

fix: use single writer for all partition streams (#3870)

# Description Collecting the streams concurrently and sending them over a sync channel provides the benefit of us keeping a single writer instance where we write through. This prevents the edge case of potentially writing many small files, when datafusion decided to lots of partitioned streams that are relatively small. Since we now maintain one writer we just keep writing from all partition streams into that. I guess this stacks well with your PR @abhiaagarwal? Wdyt? - closes #3871 We now only create one file with the same code of the above issue instead of 3 small files: ```json {"protocol":{"minReaderVersion":1,"minWriterVersion":2}} {"metaData":{"id":"0d1846c0-71e6-4c6d-aba1-5b9f8d752937","name":null,"description":null,"format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"foo\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"createdTime":1760889081860,"configuration":{}}} {"add":{"path":"part-00000-352305f0-eff1-44d3-993e-6bd31cdd118c-c000.snappy.parquet","partitionValues":{},"size":510,"modificationTime":1760889081877,"dataChange":true,"stats":"{\"numRecords\":9,\"minValues\":{\"foo\":1},\"maxValues\":{\"foo\":3},\"nullCount\":{\"foo\":0}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}} {"commitInfo":{"timestamp":1760889081878,"operation":"WRITE","operationParameters":{"mode":"ErrorIfExists"},"engineInfo":"delta-rs:py-1.2.0","clientVersion":"delta-rs.py-1.2.0","operationMetrics":{"execution_time_ms":19,"num_added_files":1,"num_added_rows":9,"num_partitions":0,"num_removed_files":0}}} ``` --------- Signed-off-by: Ion Koutsouris <[email protected]> Signed-off-by: Ion Koutsouris

1 parent d0ae849 commit f7688c4Copy full SHA for f7688c4

2 files changed

+241

-168

lines changed

crates/core/src/operations/write
- execution.rs
python
- Cargo.toml

2 files changed

+241

-168

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit f7688c4

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments