Classic filesystem: OPFS backend (MEMFS + worker) and shared IDBFS/OPFS tests#26495
Classic filesystem: OPFS backend (MEMFS + worker) and shared IDBFS/OPFS tests#26495caiiiycuk wants to merge 2 commits intoemscripten-core:mainfrom
Conversation
|
Regarding point 3, I think we can make WasmFS OPFS work with ASYNCIFY - that should not be too hard, and I think would be better than adding a new backend, and lead to faster results overall given WasmFS's general benefits? edit: Or is that not possible for some reason? |
I do believe that WasmFS needs ASYNCIFY support, but that doesn’t eliminate the need for an OPFS backend. It solves a different set of problems—specifically, it enables using a classic synchronous filesystem without Asyncify, while syncing data to OPFS. As far as I understand, WasmFS doesn’t provide such capabilities, and I’d like to point out that they are actually in high demand. WasmFS imposes fairly strict constraints: either JSPI, or using the filesystem off the main thread. But even on the main thread, some operations cannot be performed without risking deadlocks. I would completely drop this PR if WasmFS supported a MEMFS-like mode for synchronous operations with backend synchronization. I have a rough idea of how this should work—in my current job we’ve implemented a similar approach:
I’m not entirely sure whether something like this already exists in WasmFS, as it is very poorly documented—perhaps it’s similar to having a cache layer on top of the OPFS backend. But the key point is that the main thread should not be constrained beyond Asyncify, and about 90% of file access should happen without |
To make sure I follow, do you mean WasmFS+OPFS here? (WasmFS does have an in-memory backend in which files are stored in C++, which works on all threads including main, without deadlocks. That is the default backend.)
Are you saying to add a way to serialize WasmFS's in-memory files to something like OPFS, and load them later etc? |
Yes, but this PR is meant to create a drop-in replacement for IDBFS, so you can write -lodbfs instead of -lidbfs and it will work like a charm. That’s because in my use cases OPFS consistently performs better than IndexedDB.
Yep, two cases:
Here is another case — it’s more about huge games and memory usage. I want to use OPFS because, as far as I know, it’s the only system that allows streaming data directly from a fetch request to disk without consuming extra memory. I already have a working solution—I was just looking for existing open alternatives. The main point is that on mobile devices, especially Safari, memory efficiency is critical, otherwise the tab may get reloaded. This happens quite often with Unity games, and approaches that keep the entire filesystem in memory simply don’t work. Game assets are huge, but not all of them are needed at the same time. An asynchronous filesystem helps here, but there’s still a problem: if you download, say, 200 MB over the network, you typically need to buffer it in memory, which is very undesirable when every megabyte counts. OPFS fits perfectly here—we can stream data directly to disk without extra memory overhead. The file API reads data in small chunks, so in practice the game processes hundreds of megabytes while using almost no additional memory. I believe something similar is implemented in WasmFS + OPFS. However, there is another issue: handling many small files. When game assets are mixed—large files plus a lot of small ones (for example, the game Perimeter has over 9000 files, some large but mostly small)—a different strategy is needed. Based on usage statistics, our filesystem preloads frequently accessed small files into memory and serves them without Asyncify, while large files are loaded on demand. So it’s effectively a hybrid: In-Memory + WasmFS + OPFS. That said, I’m not sure this kind of system needs to be part of Emscripten itself. Most likely, it makes more sense to implement it as a separate backend for WasmFS. |
|
@caiiiycuk I definitely agree OPFS is the right solution for file storage here, yes. @tlively When using the WasmFS OPFS backend, every read and write done against OPFS, correct? I seem to recall the plan was to layer a 'caching' backend in front of it, so that the sync to OPFS could be done when needed and not constantly. Is that right? If so, what is the status of that? And @caiiiycuk did I just describe correctly the missing piece for you to use WasmFS? If not, what concretely is missing? |
|
Oh, also @caiiiycuk note that #26496 was just opened, which will support OPFS in WasmFS with standard Asyncify. |
Yes, correct—caching on top of WasmFS + OPFS is needed. Then WasmFS can be used without Asyncify under certain conditions. As I mentioned earlier, you can cache everything and perform synchronous reads/writes from the main thread based on that cache. |
- libopfs.js: WASMFS path unchanged; without WASMFS, implement IDBFS-like OPFS mount on top of MEMFS, persisting via a worker and sync access handles; store mtime/mode metadata in .emscripten-opfs-timestamps; Safari createSyncAccessHandle compatibility; autoPersist and syncfs queue/reconcile; OPFS.quit on exit. - Rename shared browser tests to test_idbfs_opfs_*.c; use -DOPFS vs IDBFS and AUTO_PERSIST for both backends. - test_browser: wire IDBFS tests to new paths; add opfs_* test variants; include $removeRunDependency for idbfs fsync pre.js.
Summary
Adds an IDBFS-style OPFS filesystem for the classic Emscripten
FS(when WASMFS is off): data lives in MEMFS and is synchronized to the Origin Private File System from a Web Worker using sync access handles. WASMFS builds keep using the existing WASMFS OPFS backend only.Also generalizes the existing IDBFS browser tests so the same sources run against IDBFS or OPFS (
-DOPFS,-lopfs.js), and adds OPFS-specifictest_browserentries.Motivation
OPFS with sync handles is a strong persistence option in modern browsers, but classic
FSpreviously had no equivalent to the IDBFSmount+syncfs+autoPersistworkflow. This brings a similar API and behavior for apps that still use the legacy filesystem stack.Why is another FS implementation needed, similar to IndexedDB:
.emscripten-opfs-stats.Implementation notes
navigator.storage.getDirectory()(optional subpath under the OPFS root); timestamps and modes are stored in a hidden.emscripten-opfs-statsfile (JSON), since raw OPFS entries don’t match POSIX metadata the same way as IDB.createSyncAccessHandle: handles Safari’s older 0-argument form vs newer{ mode: 'in-place' }.autoPersist: mirrors IDBFS batching (queuePersist/opfsPersistState).syncfs: per-mount serialization queue; reconcile ordering creates dirs before files, removes files before dirs; flush persists metadata after reconcile.OPFS.quit()terminates the worker and revokes the blob URL.Tests
test_fs_idbfs_*updated totest_idbfs_opfs_*.c+-DAUTO_PERSIST.test_fs_opfs_sync,test_fs_opfs_autopersist,test_fs_opfs_fsync.Requirements / limitations