Skip to content

NAS-140523 / 27.0.0-BETA.1 / NFSD: return ESTALE for snapdir entries when export lookup fails#243

Merged
ixhamza merged 1 commit intotruenas/linux-6.18from
nfsd-snapdir-empty-readdir
Apr 3, 2026
Merged

NAS-140523 / 27.0.0-BETA.1 / NFSD: return ESTALE for snapdir entries when export lookup fails#243
ixhamza merged 1 commit intotruenas/linux-6.18from
nfsd-snapdir-empty-readdir

Conversation

@ixhamza
Copy link
Copy Markdown
Member

@ixhamza ixhamza commented Apr 2, 2026

When nfsd_cross_mnt() crosses into a mounted ZFS snapshot but rqst_exp_get_by_name() fails to resolve the sub-export (-ENOENT), the error is silently converted to success and the automount stub dentry is returned to the caller. The stub has simple_dir_operations and its file handle is encoded with gen=1 (snapshot is mounted).

This 44-byte gen=1 handle becomes a permanent trap: zfsctl_snapdir_vget() sees gen=1 matching d_mountpoint=true, returns the stub inode, and READDIR returns NFS4_OK with zero entries. The client caches this empty result indefinitely since there is no error signal to trigger re-resolution. The empty directory persists until change_info4 updates (e.g., manual snapshot creation on the server).

For zfs_snapdir exports, return -ESTALE instead of silently falling back to the automount stub. This causes the client to re-resolve via LOOKUP.

Testing

# Server: setup pool, snapshots, and export with zfs_snapdir
systemctl mask systemd-networkd-wait-online.service
zpool destroy tank 2>/dev/null; echo "" > /etc/exports; exportfs -r
zpool create -f tank -O mountpoint=/mnt/tank sda
zfs create tank/target
dd if=/dev/urandom of=/mnt/tank/target/bigfile bs=1M count=5
for i in 1 2 3 4 5; do echo "snap${i}" > /mnt/tank/target/file${i}.txt; zfs snapshot tank/target@snap${i}; done
echo 0 > /sys/module/zfs/parameters/zfs_expire_snapshot
echo '"/mnt/tank/target" *(rw,sync,no_subtree_check,no_root_squash,zfs_snapdir)' > /etc/exports
exportfs -ra
for i in 1 2 3 4 5; do ls /mnt/tank/target/.zfs/snapshot/snap${i}/ > /dev/null; done

# Client: mount and verify snapshots work
sudo umount /mnt/tank 2>/dev/null; sudo mkdir -p /mnt/tank
sudo mount -t nfs4 -o vers=4.2 192.168.18.9:/mnt/tank/target /mnt/tank
ls /mnt/tank/.zfs/snapshot/snap1/

# Server: inject a negative entry in the nfsd.export cache for snap1.
# This simulates what happens at the customer when zfs destroy triggers
# exportfs_flush (wiping the kernel export cache) and mountd temporarily
# fails to resolve the snapshot sub-export during cache repopulation —
# e.g., due to heavy concurrent activity (long-running zfs recv,
# dsl_process_async_destroys on a large pool).
echo "* /mnt/tank/target/.zfs/snapshot/snap1 $(($(date +%s) + 3600))" > /proc/net/rpc/nfsd.export/channel

# Client: drop page cache and access snap1 again.
# Without fix: snap1 appears as empty directory (NFS4_OK, zero entries).
#   nfsd_cross_mnt silently falls back to the automount stub with gen=1.
#   The client caches this permanently — no error, no retry.
# With fix: snap1 returns ESTALE.
#   nfsd_cross_mnt returns -ESTALE instead of falling back.
#   The client knows the handle is stale and will re-resolve once
#   the negative cache entry expires or is flushed.
echo 2 | sudo tee /proc/sys/vm/drop_caches
ls /mnt/tank/.zfs/snapshot/snap1/

When nfsd_cross_mnt() crosses into a mounted ZFS snapshot but
rqst_exp_get_by_name() fails to resolve the sub-export (-ENOENT), the
error is silently converted to success and the automount stub dentry is
returned to the caller. The stub has simple_dir_operations and its file
handle is encoded with gen=1 (snapshot is mounted).

This 44-byte gen=1 handle becomes a permanent trap:
zfsctl_snapdir_vget() sees gen=1 matching d_mountpoint=true, returns the
stub inode, and READDIR returns NFS4_OK with zero entries. The client
caches this empty result indefinitely since there is no error signal to
trigger re-resolution. The empty directory persists until change_info4
updates (e.g., manual snapshot creation on the server).

For zfs_snapdir exports, return -ESTALE instead of silently falling back
to the automount stub. This causes the client to re-resolve via LOOKUP.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
@ixhamza ixhamza requested a review from anodos325 April 2, 2026 19:09
@ixhamza ixhamza added the jira label Apr 2, 2026
@bugclerk bugclerk changed the title NFSD: return ESTALE for snapdir entries when export lookup fails NAS-140523 / 27.0.0-BETA.1 / NFSD: return ESTALE for snapdir entries when export lookup fails Apr 2, 2026
@bugclerk
Copy link
Copy Markdown

bugclerk commented Apr 2, 2026

@bugclerk
Copy link
Copy Markdown

bugclerk commented Apr 3, 2026

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators Apr 3, 2026
@ixhamza
Copy link
Copy Markdown
Member Author

ixhamza commented Apr 3, 2026

time 20:00

@bugclerk
Copy link
Copy Markdown

bugclerk commented Apr 3, 2026

Time tracking added.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants