-
Notifications
You must be signed in to change notification settings - Fork 434
Open
Open
[Log] Improve internals of LogScanner by lazily deserialising from record stream / arrow buffer#2041
Feature
Copy link
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Optimise/decrease memory footprint when LogScannerImpl.poll(Duration) is used by addressing TODO here: https://github.com/apache/fluss/blob/main/fluss-client/src/main/java/org/apache/fluss/client/table/scanner/log/CompletedFetch.java#L96-L97
Solution
Proposed changes as follow:
Change the signatures and implementation that currently returns List<ScanRecord> to return CloseableIterator<ScanRecord>. This includes (not exhaustive):
- CompletedFetch's
List<ScanRecord> fetchRecords(int)toCloseableIterator<ScanRecord> fetchRecords(). Current int argument is to indicate max number of records to fetch and deserialise, I propose removing it and let the eventual user of the iterator decide how many to fetch. An alternative here is to make CompletedFetch implementsIterable<ScanRecord>. - LogFetchCollector's
Map<TableBucket, List<ScanRecord>> collectFetch(LogFetchBuffer)toMap<TableBucket, CloseableIterator<ScanRecord>> collectFetch(LogFetchBuffer) - LogFetcher's
Map<TableBucket, List<ScanRecord>> collectFetch(LogFetchBuffer)toMap<TableBucket, CloseableIterator <ScanRecord>> collectFetch(LogFetchBuffer)
The closing of resources is done in two places:
- Within ScanRecords.ConcatenatedIterable where CloseableIterators are closed when they have no next element.
- Within LogScannerImpl.close() where LogScannerImpl will track unclosed CloseableIterators and close them.
Anything else?
Feedback and suggestion on alternative approaches welcome!
Willingness to contribute
- I'm willing to submit a PR!
Metadata
Metadata
Assignees
Labels
No labels