Fix Memory Leaks and Improve Cleanup in S3 Multipart Upload#15210
Fix Memory Leaks and Improve Cleanup in S3 Multipart Upload#15210kumarpritam863 wants to merge 18 commits intoapache:mainfrom
Conversation
|
@singhpk234 can you please review. |
|
@RussellSpitzer can you please take a look. |
| // clear staging files and multipart map | ||
| stagingFiles.clear(); | ||
| multiPartMap.clear(); |
There was a problem hiding this comment.
its fine to do eager cleanup
though wouldn't the stream post abort closed and hence GCed ? is this staying in memory for long
There was a problem hiding this comment.
Thanks @singhpk234 for the review.
Regarding memory management:
While the staging files list will eventually allow objects to be garbage-collected once they go out of scope, I’m concerned that retaining strong references to many FileAndDigest objects (especially in upload-heavy / long-running workloads) can still cause practical issues:
- Increased heap pressure during periods of high concurrent or sequential uploads
- Longer object lifetime → more frequent / longer GC pauses
- Higher risk of OutOfMemoryError during peak load (I’ve sometimes observed OOMs in similar scenarios when large numbers of parts accumulate without cleanup while running Iceberg-Kafka-Connect)
Even though the theoretical lifetime is finite, the practical memory pressure and GC overhead seem non-negligible in our use case.
Also although it does not effect the AWS multipart upload as AWS requires the part number to be unique but starting the part number from 1 and keeping it in low bounds make managing CompleteMultipartUpload requests easier. Currently the part number comes from the Index() of the part-file from staging files list which can start from a higher number if the previous files are not cleared.
Please let me know your thoughts on these.
|
Hi @singhpk234 I have also added the tests for proper cleaning of staging files and multipart map. Can you please review once. |
Cleanup staging files list and multipart upload after close() and ubortUpload() after closing all the staging files.