RFE445: add incremental object get by k82cn · Pull Request #446 · xflops/flame

k82cn · 2026-05-11T23:22:58Z

Summary

add version-aware object cache get responses for full, patch-only, and not-modified reads
update flamepy cache handling for full, patch-only, and cached materialized responses
add RFE445 design doc, Rust/Python unit tests, E2E coverage, and replay-buffer benchmark flags

Validation

cargo fmt --all -- --check
cargo build -p flame-object-cache
cargo clippy -p flame-object-cache --all-targets -- -D warnings
cargo test -p flame-object-cache --quiet
cd sdk/python && ./.venv/bin/python -m pytest tests/test_cache.py -q
cd sdk/python && ./.venv/bin/ruff check src tests
cd sdk/python && ./.venv/bin/ruff format --check src tests
cd sdk/python && uv build
cd e2e && uv run -n ruff check src tests
cd e2e && uv run -n ruff format --check src tests
cd e2e && uv build

Note

Selected E2E pytest execution was attempted locally but blocked by local Flame config: current-context is not set.

gemini-code-assist

Code Review

This pull request implements incremental object retrieval by patch version, optimizing network transfer for patch-heavy objects like replay buffers. Key changes include updating the Arrow Flight protocol to support versioned patches, modifying the Rust server to stream partial updates, and enhancing the Python SDK to cache patches and materialized views. Review feedback highlights several performance optimizations in the Rust server, such as avoiding unnecessary data cloning and using a RwLock for concurrent reads. Additionally, a critical issue was identified in the Python SDK where using id() for caching materialized views could lead to memory leaks and cache misses when using bound methods.

gemini-code-assist · 2026-05-11T23:26:11Z

+fn object_patches_to_flight_data_vec(patches: &[Object]) -> Result<Vec<FlightData>, FlameError> {
+    let rows = patches
+        .iter()
+        .map(|delta| (ObjectResponseKind::Patch, delta))
+        .collect();
+    object_rows_to_flight_data_vec(rows)
+}


This helper function can be optimized to avoid unnecessary cloning of Object instances (and their underlying Vec<u8> data) by working with references. This is particularly important for large objects and high-concurrency scenarios.

Suggested change

fn object_patches_to_flight_data_vec(patches: &[Object]) -> Result<Vec<FlightData>, FlameError> {

let rows = patches

.iter()

.map(|delta| (ObjectResponseKind::Patch, delta))

.collect();

object_rows_to_flight_data_vec(rows)

}

fn object_patches_to_flight_data_vec(patches: Vec<&Object>) -> Result<Vec<FlightData>, FlameError> {

let rows = patches

.into_iter()

.map(|delta| (ObjectResponseKind::Patch, delta))

.collect();

object_rows_to_flight_data_vec(rows)

}

gemini-code-assist · 2026-05-11T23:26:12Z

+        } else if client_version != 0 && object.version <= client_version {
+            let needed_patches: Vec<Object> = object
+                .deltas
+                .iter()
+                .filter(|delta| delta.version > client_version)
+                .cloned()
+                .collect();


Cloning all patches before checking contiguity and encoding is inefficient. Using references avoids deep-copying the patch data.

} else if client_version != 0 && object.version <= client_version { let needed_patches: Vec<&Object> = object .deltas .iter() .filter(|delta| delta.version > client_version) .collect();

gemini-code-assist · 2026-05-11T23:26:12Z

+                    client_version,
+                    server_version
+                );
+                let flight_data_vec = object_patches_to_flight_data_vec(&needed_patches)?;


Pass the vector of references directly to the optimized helper.

Suggested change

let flight_data_vec = object_patches_to_flight_data_vec(&needed_patches)?;

let flight_data_vec = object_patches_to_flight_data_vec(needed_patches)?;

gemini-code-assist · 2026-05-11T23:26:12Z



+def _materialize_object(obj: Object, deserializer: Optional[Deserializer] = None) -> Any:
+    materialized_key = None if deserializer is None else id(deserializer)


Using id(deserializer) as a cache key is problematic for bound methods (e.g., self._deserializer), as Python creates a new bound method object with a unique ID on every access. This results in a 0% cache hit rate and a memory leak as the materialized dictionary grows with every call. Using the deserializer object itself as the key is safe and efficient, as functions and methods are hashable and bound methods implement equality correctly.

Suggested change

materialized_key = None if deserializer is None else id(deserializer)

materialized_key = deserializer

gemini-code-assist · 2026-05-11T23:26:12Z

+    pub fn current_version(&self) -> u64 {
+        self.deltas
+            .iter()
+            .map(|delta| delta.version)
+            .max()
+            .unwrap_or(self.version)
+            .max(self.version)
+    }


The .max(self.version) call is redundant because unwrap_or(self.version) already ensures the result is at least self.version if the iterator is empty. If the iterator is not empty, the maximum delta version should naturally be greater than or equal to the base version.

pub fn current_version(&self) -> u64 { self.deltas .iter() .map(|delta| delta.version) .max() .unwrap_or(self.version) }

gemini-code-assist · 2026-05-11T23:26:12Z

+            .cache
+            .get_write_lock(&key_str)
+            .map_err(|e| Status::internal(format!("Lock error: {}", e)))?;
+        let _guard = write_lock.lock().await;


Using an exclusive Mutex lock for do_get operations serializes all read requests for the same key, which can become a significant performance bottleneck. Since cache.get() is thread-safe and returns a cloned Object, the returned data is consistent. If synchronization against concurrent writes is required to avoid reading partial storage states, consider using a RwLock to allow multiple concurrent readers.

gemini-code-assist · 2026-05-11T23:26:12Z

    version: int
    data: Any
+    patches: List[Patch] = field(default_factory=list)
+    materialized: Dict[Optional[int], Any] = field(default_factory=dict)


The type hint for the materialized dictionary should reflect that it will store callables as keys to support stable caching of materialized views.

Suggested change

materialized: Dict[Optional[int], Any] = field(default_factory=dict)

materialized: Dict[Optional[Callable], Any] = field(default_factory=dict)

feat: add incremental object get

b654f96

k82cn marked this pull request as ready for review May 11, 2026 23:25

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

k82cn added 4 commits May 12, 2026 07:35

fix: address incremental get review comments

f848b08

fix: harden incremental get client cache

0f27b8b

fix: align incremental get design and impl

3e27e2e

docs: add replay buffer perf comparison steps

98050a3

k82cn merged commit 14a70c2 into main May 12, 2026
11 checks passed

k82cn deleted the flm_445 branch May 12, 2026 02:56

k82cn mentioned this pull request May 12, 2026

Path increasement in get_object #445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE445: add incremental object get#446

RFE445: add incremental object get#446
k82cn merged 5 commits into
mainfrom
flm_445

k82cn commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	let flight_data_vec = object_patches_to_flight_data_vec(&needed_patches)?;
	let flight_data_vec = object_patches_to_flight_data_vec(needed_patches)?;



		def _materialize_object(obj: Object, deserializer: Optional[Deserializer] = None) -> Any:
		materialized_key = None if deserializer is None else id(deserializer)

	materialized_key = None if deserializer is None else id(deserializer)
	materialized_key = deserializer

	materialized: Dict[Optional[int], Any] = field(default_factory=dict)
	materialized: Dict[Optional[Callable], Any] = field(default_factory=dict)

Conversation

k82cn commented May 11, 2026

Summary

Validation

Note

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant