Skip to content

Conversation

@Sonicadvance1
Copy link
Member

Rebased version of #4765

Paranoid TSO is known broken, but it's already broken on main. Nothing of value was lost. I'll fix that later or whatever.

@Sonicadvance1 Sonicadvance1 force-pushed the fex-atomic branch 4 times, most recently from 4120274 to 12075a8 Compare October 27, 2025 19:19
@Sonicadvance1 Sonicadvance1 force-pushed the fex-atomic branch 3 times, most recently from cb95afb to 2af17ce Compare November 10, 2025 21:43
bylaws and others added 3 commits November 10, 2025 20:03
…ed atomic

Default to enabled because this is the config we expect by default.
In the future will get some benchmarking in various games like
Assassin's Creed, and Call of Duty.
evlaV-bot pushed a commit to evlaV/linux-integration that referenced this pull request Nov 12, 2025
This patch proposes adding kernel-side emulation for unaligned atomic
instructions on ARM64. This is intended for x86 emulators (like FEX)
that struggle to effectively handle such operations in userspace alone.
Such handling is required as x86 permits such unaligned accesses (albeit
sometimes with a performance penalty as in the case of split-locks[1])
but ARM64 does not and will raise a bus error.

User applications that wish to enable support for this can use the new
pctrl() flag `PR_ARM64_UNALIGN_ATOMIC_EMULATE`. Some optimizations and
instructions were left for future revisions of this patchset.

Emulators like FEX attempt to emulate this in userspace, but with
caveats in two areas:

 * Performance

It should first be noted that due to x86's TSO (total store order)
memory model, ARM64 atomic instructions must be used for all memory
accesses. This results in unaligned loads/stores being much more common
than one would expect and the overhead of emulating them significantly
impacting performance.  For this common case of unaligned loads/stores,
code backpatching is used in FEX to avoid repeated overhead from
handling the same faulting access. This replaces faulting unaligned
atomic ARM64 instructions with regular load/stores and memory barriers.
This comes at a cost of introducing significant performance problems if
a function like memcpy ends up being patched because it very
infrequently happens to be used with unaligned memory. This is severe
enough to make games like Mirror's Edge and Assassin's Creed: Origin
unplayable without application-specific configuration.

Microbenchmarks[2] measure a 4x decrease in overhead with kernel-side
handling compared to userspace, and this figure is currently even larger
when FEX is ran under Wine. Such a dramatic decrease would make it
reasonable for FEX to default to the no-backpatching path and provide
consistent performance. 

 * Correctness:

x86 atomic accesses can cross 16-byte (LSE2) granules, but there is no
ARM64 instruction that would allow for direct atomic emulation of this.
As such, a lock must be taken for correctness. While this is easy to
emulate in userspace within a process, correct atomic cross-granule
operations on shared memory mapped into multiple processes would require
a lock shared between all FEX instances which cannot be implemented
safely in userspace as is (robust futexes do not work here as they are
under the control of the emulated x86 program). Note, this is a less
coarse restriction than split locks on x86, which are only concerned
with accesses crossing a 64 byte cacheline size.

 * Precedent:

Both XNU and NT kernels support unaligned atomic emulation for their
respective x86 emulators. Windows additionally supports 'volatile
metadata', which is emitted by newer versions of MSVC to inform
emulators which specific load/store accesses require atomic handling
[3]. FEX supports this together with an extension mechanism [4] which
can be manually populated to avoid e.g. the aforementioned Assassin's
Creed slowdown.

This implementation is a RFC so we can learn more about how to make this
code upstream and what the maintainers think of such feature being
merged here. The code is a simplified version of the original work done
by Billy Laws, where we accept just a subset of 64bit atomic
instructions that are enough to be used with a benchmark tool[2], and
this is the proposed interface being used by FEX: [5].

Thanks!
	André

[1] https://lwn.net/Articles/911219/
[2] https://gitlab.freedesktop.org/freedesktop/snippets/-/snippets/7875
[3] https://learn.microsoft.com/en-us/cpp/build/reference/volatile?view=msvc-170
[4] FEX-Emu/FEX#4773
[5] FEX-Emu/FEX#4985

To: Catalin Marinas <[email protected]>
To: Will Deacon <[email protected]>
Cc: Billy Laws <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Ryan Houdek <[email protected]>

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 1,
    "change-id": "20251106-tonyk-unaligned_atomic-c6ca37947bfc",
    "prefixes": [
      "RFC"
    ]
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants