Skip to content

Support Lazily Ref Clone#10

Draft
samir-openai wants to merge 1 commit intocloudflare:mainfrom
samir-openai:codex/async-gated-prepare
Draft

Support Lazily Ref Clone#10
samir-openai wants to merge 1 commit intocloudflare:mainfrom
samir-openai:codex/async-gated-prepare

Conversation

@samir-openai
Copy link
Copy Markdown

@samir-openai samir-openai commented Apr 18, 2026

This is a draft proposal for a lazy clone

lazily add-repo

The current implementation still requires has blocking loading time on the async ref only clone.

artifact-fs add-repo --async ...

This allows a repo to be actively cloning refs via the daemon - but not block container/sandbox startup. This is good for cases where a repo may not be used by an agent at all ... Thus allowing agents to lazily interact with the git repo - even if the clone of refs itself takes a long time.

Summary

The key change is Block every single filesystem operation with a wrapper around ArtifactFuse.
Existing file operation code stays mostly unchanged.

This still requires a placeholder FUSE mount, because a process can only block on repo filesystem operations if ArtifactFS has mounted something at that path. A repo that is merely “hidden until ready” without a mount would cause operations to fail or see a normal empty directory, not block.

Behavior and state model

Important Behavioral Choice

Use this behavior:

  • artifact-fs add-repo --async ... returns after registration.
  • The daemon mounts the repo path immediately with a readiness-gated FUSE filesystem.
  • Content operations under the mount block while clone/index is in progress.
  • After successful preparation, the gate opens and operations continue normally.
  • If preparation fails, blocked operations return an error and later operations fail fast until retry.
  • A small set of kernel lifecycle operations may pass through for mount stability, but user-facing repo operations block.

Examples that block while preparation is active:

ls /tmp/repo
git -C /tmp/repo status
cat /tmp/repo/README.md
less /tmp/repo/README.md

Minimal State Model

Add three async preparation states:

  • preparing
  • ready
  • failed

Registry additions:

  • remote_url TEXT NOT NULL DEFAULT ''
    • Persist only credential-free remotes for async daemon-side clone.
  • prepared_gitdir INTEGER NOT NULL DEFAULT 0
    • Optional generic prepared-gitdir support.
  • fetch_ref TEXT NOT NULL DEFAULT ''
    • Optional ref override; defaults to branch.
  • prepare_state TEXT NOT NULL DEFAULT ''
    • Empty means legacy/blocking repo.
  • prepare_error TEXT NOT NULL DEFAULT ''
    • Redacted failure for status/debugging.
CLI and API

CLI/API

Keep existing blocking behavior unchanged:

artifact-fs add-repo \
  --name repo \
  --remote https://github.com/org/repo.git \
  --branch main \
  --mount-root /tmp

Add async mode:

artifact-fs add-repo \
  --async \
  --name repo \
  --remote https://github.com/org/repo.git \
  --branch main \
  --mount-root /tmp

Keep generic prepared-gitdir compatibility, but as a thin variant:

artifact-fs add-repo \
  --async \
  --prepared-gitdir \
  --git-dir /state/repos/repo/git \
  --name repo \
  --remote https://github.com/org/repo.git \
  --branch main \
  --mount-root /tmp

Add retry:

artifact-fs prepare --name repo

Auth rules:

  • --async rejects HTTPS remotes with inline credentials.
  • Async clone/fetch uses ambient non-interactive credentials.
  • Set GIT_TERMINAL_PROMPT=0.
  • Use SSH batch mode when no GIT_SSH_COMMAND is already configured.
Implementation design

FUSE Design

Add internal/fusefs/ready_gate.go:

  • Wait(ctx) error
  • MarkReady()
  • MarkFailed(error)
  • Reset()

Add internal/fusefs/gated_fs.go:

  • A gatedFileSystem implements fuseutil.FileSystem.
  • It wraps the existing ArtifactFuse.
  • Each user-facing method calls gate.Wait(ctx) before delegating.
  • Kernel lifecycle methods pass through or minimally avoid deadlocks:
    • ForgetInode
    • BatchForget
    • ReleaseDirHandle
    • ReleaseFileHandle
    • Destroy
    • likely StatFS

This file has method boilerplate, but the existing operation implementations remain clean and unchanged.

Update MountRepo to support an optional gate:

func MountRepo(repo model.RepoConfig, resolver *Resolver, engine *Engine, gate *ReadyGate) (MountedFS, error)

Behavior:

  • gate == nil: existing behavior.
  • gate != nil: wrap NewArtifactFuse(...) in gatedFileSystem.

Daemon Design

Keep the existing blocking path for normal repos.

For async repos:

  1. add-repo --async persists safe config with prepare_state=preparing and returns.
  2. Daemon syncRepos sees the async repo.
  3. Daemon creates snapshot/overlay/resolver as usual, but with generation 0.
  4. Daemon mounts FUSE immediately with a closed readiness gate.
  5. Daemon starts a background prepare goroutine.
  6. Background prepare:
    • clones blobless, or validates/fetches prepared gitdir
    • resolves HEAD
    • builds/publishes snapshot
    • reconciles overlay
    • sets resolver generation
    • starts watcher/refresh loops
    • marks state ready
    • opens the gate
  7. On failure:
    • persist redacted prepare_error
    • mark state failed
    • fail the gate so waiters unblock with error
  8. artifact-fs prepare --name repo:
    • resets state to preparing
    • resets the gate if mounted
    • starts a new background prepare job

Use one in-memory preparing map[RepoID]bool to dedupe daemon workers. A cross-process lock is optional and can be skipped in the simplified version unless multiple daemons are supported.

Gitstore Changes

Add helpers:

  • CloneBloblessNonInteractive(ctx, cfg)
  • FetchRefNonInteractive(ctx, cfg, ref)
  • ValidatePreparedGitDir(ctx, cfg)
  • PrepareFetchedBranch(ctx, cfg, ref)

Keep the existing credential-helper path for blocking clone with inline credentials.

For async clone:

  • Require credential-free RemoteURL.
  • Persist the safe remote URL.
  • Clone with non-interactive environment.
  • Keep read-tree HEAD behavior, but run it before publishing readiness.

For prepared-gitdir:

  • Do not rewrite existing remotes, headers, hooks, or config.
  • Fetch the requested ref.
  • Update branch/index.
  • Publish snapshot.
  • Open gate.

Status

Keep status output simple:

repo=repo state=preparing head= ref= prepare_error=none ...
repo=repo state=mounted head=<oid> ref=main prepare_error=none ...
repo=repo state=failed head= ref= prepare_error=<redacted> ...

Map states as:

  • prepare_state=preparing -> state=preparing
  • prepare_state=failed -> state=failed
  • mounted and prepared -> state=mounted
  • legacy unmounted -> state=unmounted
Tests and assumptions

Tests

Add or adjust focused tests:

  • add-repo --async returns without cloning.
  • Async registration rejects inline HTTPS credentials.
  • Registry persists only minimal async fields.
  • ReadyGate waits, opens, fails, and resets.
  • gatedFileSystem blocks/delegates user-facing operations through a fake filesystem.
  • Failed gate returns error without calling the wrapped filesystem.
  • Prepared-gitdir gitstore helper fetches/indexes a local test repo.
  • Existing blocking add-repo tests continue to pass.
  • Full verification:
    go build ./cmd/artifact-fs
    go vet ./...
    go test ./...

Assumptions

  • The repo path should mount quickly and block user-facing operations until ready.
  • A small number of FUSE lifecycle operations may pass through for mount stability.
  • Existing blocking behavior remains the default.
  • Async private repo support uses ambient credentials.
  • Product-specific bootstrap logic stays outside ArtifactFS.
Proof output

Proof

Phase What Ran Timing
Before Plain initialized worktree before ArtifactFS registration/daemon 4ms
During FUSE mount active while repo status was state=preparing 3797ms
After Mounted repo after async prep completed 5ms
[2026-04-18T06:13:44Z] waiting for FUSE mountpoint /tmp/artifact-fs-proof/linux-prepared-gitdir-lottie-ios-20260418T061340Z/mnt/lottie-ios-prepared-async
[2026-04-18T06:13:44Z] mountpoint is active; starting ls while status is: repo=lottie-ios-prepared-async state=preparing head= ref= ahead=0 behind=0 diverged=false last_fetch=2026-04-18T06:13:43Z result=ok prepare_error=none hydrated_blobs=0 hydrated_bytes=0 overlay_dirty=false
[2026-04-18T06:13:44Z] ls start ms=1776492824426
total 37
-rw-r--r-- 1 root root  128 Apr 18 06:13 .git
drwxr-xr-x 1 root root 4096 Mar  9 23:07 .github
-rw-r--r-- 1 root root    0 Mar  9 23:07 .gitignore
-rw-r--r-- 1 root root    0 Mar  9 23:07 .npmignore
-rw-r--r-- 1 root root    0 Mar  9 23:07 .spi.yml
drwxr-xr-x 1 root root 4096 Mar  9 23:07 Example
-rw-r--r-- 1 root root    0 Mar  9 23:07 Gemfile
-rw-r--r-- 1 root root    0 Mar  9 23:07 Gemfile.lock
-rw-r--r-- 1 root root    0 Mar  9 23:07 LICENSE
drwxr-xr-x 1 root root 4096 Mar  9 23:07 Lottie.xcodeproj
drwxr-xr-x 1 root root 4096 Mar  9 23:07 Lottie.xcworkspace
-rw-r--r-- 1 root root    0 Mar  9 23:07 Package.resolved
-rw-r--r-- 1 root root    0 Mar  9 23:07 Package.swift
-rw-r--r-- 1 root root    0 Mar  9 23:07 README.md
-rw-r--r-- 1 root root    0 Mar  9 23:07 Rakefile
drwxr-xr-x 1 root root 4096 Mar  9 23:07 Sources
drwxr-xr-x 1 root root 4096 Mar  9 23:07 Tests
-rw-r--r-- 1 root root    0 Mar  9 23:07 Version.xcconfig
drwxr-xr-x 1 root root 4096 Mar  9 23:07 _AeFiles
drwxr-xr-x 1 root root 4096 Mar  9 23:07 _Gifs
-rw-r--r-- 1 root root    0 Mar  9 23:07 index.js
-rw-r--r-- 1 root root    0 Mar  9 23:07 lottie-ios.podspec
-rw-r--r-- 1 root root    0 Mar  9 23:07 package.json
drwxr-xr-x 1 root root 4096 Mar  9 23:07 script
[2026-04-18T06:13:48Z] ls exit=0 end ms=1776492828223 duration_ms=3797

@samir-openai samir-openai changed the title Support Support Lazily Ref Clone Apr 18, 2026
@elithrar
Copy link
Copy Markdown
Collaborator

I like this: for the most part even big repos (Next.js full clone) take ~11-15s to clone all the refs+blobless, but… 1-2s is better, especially if you’re I/O constrained.

let me know when you’re ready for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants