Skip to content

feat(storage): optimize tree func#2372

Open
baojun-zhang wants to merge 7 commits into
volcengine:mainfrom
baojun-zhang:optimize-tree
Open

feat(storage): optimize tree func#2372
baojun-zhang wants to merge 7 commits into
volcengine:mainfrom
baojun-zhang:optimize-tree

Conversation

@baojun-zhang
Copy link
Copy Markdown
Collaborator

@baojun-zhang baojun-zhang commented Jun 1, 2026

Description

Push the tree recursive traversal capability down from the Python layer into ragfs (Rust). A default implementation equivalent to Python's _tree_original is provided in core/filesystem.rs, and s3fs gets a high-performance override based on flat listing. The Python VikingFS layer now only handles view shaping, ACL, and agent enrichment, eliminating the duplicated DFS between _tree_original/_tree_agent. Zero semantic regression, with a major performance gain for wide S3 directories.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Rust Core: add TreeEntry (core/types.rs); add tree_directory() / tree_directory_internal() default recursive implementation to the FileSystem trait (core/filesystem.rs), preserving node_limit/level_limit/show_hidden/rel_path/DFS-order semantics identical to the original Python logic.
  • MountableFS: add tree_directory() routing and rewrite plugin-internal paths back to global AGFS absolute paths (core/mountable.rs).
  • StatsWrappedFS: add the tree_directory() proxy and FsOperation::TreeDir (core/stats_wrapper.rs, core/stats.rs), fixing the wrapper not delegating, which previously made the S3FS optimization ineffective.
  • S3FS: override tree_directory() and add build_tree_entries_from_flat_listing(), replacing per-directory recursive read_dir with a single/few paginated flat listings; client.rs adds list_tree_objects() that retains directory markers (plugins/s3fs/mod.rs, plugins/s3fs/client.rs).
  • Bindings: expose tree_directory() and TreeEntry → PyDict serialization (ragfs-python/src/lib.rs).
  • Python: async_client.py adds an async tree_directory(); viking_fs.py refactors _tree_original/_tree_agent into adapters/enrichment built on a shared _iter_visible_tree_entries(), adds _is_tree_entry_visible()/_ancestor_is_filtered(), and keeps the "apply node_limit after ACL filtering" semantics.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Added/passing test coverage: Rust Core tree_directory default implementation (RUST-TREE-001018), S3FS override (RUST-S3-001012), MountableFS routing (RUST-MNT-001~004); Python _is_tree_entry_visible/_iter_visible_tree_entries/_tree_original/_tree_agent (PY-FLT/ITER/ORIG/AGENT, 31 cases, tests/storage/test_viking_fs_tree.py). 65 unit tests pass in total. Performance regression: at the same result scale (1000 nodes), tree dropped from ~31615ms/call to ~10726ms/call (~2.95x), p99 from 56835ms to 13035ms (~4.36x), and stability improved from 99/100 to 100/100.

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

@baojun-zhang
Copy link
Copy Markdown
Collaborator Author

baojun-zhang commented Jun 1, 2026

性能测试(tree)

前置

导入 1000 个资源,用于准备 tree 测试数据:

python3 ./OpenViking/load_test_add_resource_1000.py

测试命令

python3 perf/s3/ls/load_test_tree_100.py \
  --account-id tenant_d9f3c5b17f \
  --user-id user_b27e46a1cb \
  --api-key 'dGVuY...ZA' \
  --uri viking://resources \
  --iterations 100

结果汇总(100 次)

场景 iterations ok fail total_sec avg_ms p50_ms p95_ms p99_ms last_entry_count
优化前 100 99 1 3161.55 31615.41 37581.54 47593.37 56835.03 1000
优化后 100 100 0 1072.66 10726.53 10455.09 12912.64 13034.52 1000

结论

在相同结果规模(last_entry_count=1000)下,tree 从 ~31615ms/次 降到 ~10726ms/次,提升约 2.95 倍;同时稳定性提升:优化前有 1 次 500 失败(i=69),优化后 100 次全部成功(fail=0)。尾延迟改善尤为明显,p99 从 56835ms 降至 13035ms(约 4.36 倍)。

原始输出

优化前

progress: 20/100, ok=20, fail=0
progress: 40/100, ok=40, fail=0
progress: 60/100, ok=60, fail=0
ERR i=69 status=500 data=None text=
progress: 80/100, ok=79, fail=1
progress: 100/100, ok=99, fail=1

Summary:
  iterations: 100
  ok: 99
  fail: 1
  total_sec: 3161.5479
  avg_ms: 31615.41
  p50_ms: 37581.54
  p95_ms: 47593.37
  p99_ms: 56835.03
  last_entry_count: 1000

优化后

progress: 20/100, ok=20, fail=0
progress: 40/100, ok=40, fail=0
progress: 60/100, ok=60, fail=0
progress: 80/100, ok=80, fail=0
progress: 100/100, ok=100, fail=0

Summary:
  iterations: 100
  ok: 100
  fail: 0
  total_sec: 1072.6601
  avg_ms: 10726.53
  p50_ms: 10455.09
  p95_ms: 12912.64
  p99_ms: 13034.52
  last_entry_count: 1000

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 90
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢把 tree 遍历能力下沉到 ragfs。方向是合理的:tree traversal 本质上是 filesystem capability,S3FS 也确实应该有 backend-specific 的 flat listing 优化。

这次我先 request changes,主要有两个合并前需要处理的问题:

Blocking:

  • S3 tree 仍然会在 Python 应用可见节点 node_limit 之前 materialize 整个 prefix;对于大 S3 目录,即使默认 limit=1000,也可能全量拉取、全量建树、全量排序,仍然有超时或内存风险。
  • VikingFS.tree() 现在硬依赖 wrapped AGFS client 暴露 tree_directory();这会破坏只实现旧 ls() surface 的现有 client/wrapper。

Non-blocking 结构建议:

  • merge 前建议 squash 掉 fixup/format/remove-test 这类过程性 commit。
  • S3 tree builder 和 Python tree 测试都比较大;建议把 S3 tree builder 拆成独立模块,并抽出 Python 测试里的重复 fixture/setup,这样后续维护和 review 会轻一些。

Comment thread crates/ragfs/src/plugins/s3fs/client.rs
Comment thread openviking/storage/viking_fs.py Outdated
Comment thread crates/ragfs/src/plugins/s3fs/mod.rs Outdated
Comment thread tests/storage/test_viking_fs_tree.py Outdated
@baojun-zhang baojun-zhang requested a review from qin-ctx June 3, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants