Make AcquireAsync throw on failure to fix the silent-null lock footgun#508
Make AcquireAsync throw on failure to fix the silent-null lock footgun#508
Conversation
BREAKING CHANGE: Renames the interface method `ILockProvider.AcquireAsync` to `TryAcquireAsync` (still returns `ILock?`) and introduces new `AcquireAsync` extension methods that throw `LockAcquisitionTimeoutException` when the lock cannot be obtained. Motivation: callers that forgot the `null` check could silently run work without holding the lock — the very scenario the lock was meant to prevent. The previous single shape made the dangerous-path easy and the safe-path require ceremony. The new design separates the two intents: - `TryAcquireAsync` — returns `ILock?`. Use when lock unavailability is a normal control-flow outcome (best-effort dedupe, opportunistic work). - `AcquireAsync` — returns `ILock`, throws `LockAcquisitionTimeoutException` on failure. Use when the work cannot run safely without the lock. Internal callers (jobs, queue tests, scheduled jobs) move to `TryAcquireAsync` to preserve their existing best-effort semantics. Only implementation in this repo: `CacheLockProvider`, `ScopedLockProvider`, `ThrottlingLockProvider`. Callers in other Foundatio repos must rename their `AcquireAsync` overrides/calls to `TryAcquireAsync`. Adds three new test cases in `LockTestBase` exercising the throw shape for single resource, cancelled token, and multi-resource paths. Updates `docs/guide/locks.md` to document the dual API and pick-the-right-method guidance.
There was a problem hiding this comment.
Pull request overview
Splits lock acquisition into an explicit Try-pattern (TryAcquireAsync returning ILock?) and a throwing pattern (AcquireAsync extension methods returning ILock and throwing LockAcquisitionTimeoutException), and migrates callers/tests/docs to the new API to avoid accidental “no-lock” execution.
Changes:
- Renamed
ILockProvider.AcquireAsync→TryAcquireAsyncacross providers and call sites. - Added
AcquireAsyncextension methods (single + multi-resource) that throwLockAcquisitionTimeoutExceptionon failure. - Updated tests and documentation to reflect the dual acquisition shapes and recommended usage patterns.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Foundatio.Tests/Utility/ResiliencePolicyTests.cs | Updates lock acquisition call to TryAcquireAsync. |
| tests/Foundatio.Tests/Locks/InMemoryLockTests.cs | Adds [Fact] overrides to run new throw-shape base tests for the in-memory provider. |
| src/Foundatio/Lock/ThrottlingLockProvider.cs | Renames provider implementation method to TryAcquireAsync. |
| src/Foundatio/Lock/ScopedLockProvider.cs | Renames provider implementation method and forwards to TryAcquireAsync. |
| src/Foundatio/Lock/LockAcquisitionTimeoutException.cs | Introduces new exception used by throwing acquisition APIs. |
| src/Foundatio/Lock/ILockProvider.cs | Renames interface method and adds TryAcquireAsync/AcquireAsync extension overloads + updates TryUsingAsync internals. |
| src/Foundatio/Lock/CacheLockProvider.cs | Renames provider implementation method to TryAcquireAsync. |
| src/Foundatio.TestHarness/Queue/QueueTestBase.cs | Migrates distributed-lock tests to the throwing AcquireAsync where success is required. |
| src/Foundatio.TestHarness/Locks/LockTestBase.cs | Migrates base tests to TryAcquireAsync and adds new base tests for the throwing AcquireAsync shape. |
| src/Foundatio.TestHarness/Jobs/WithLockingJob.cs | Updates best-effort job lock acquisition to TryAcquireAsync. |
| src/Foundatio.TestHarness/Jobs/ThrottledJob.cs | Updates best-effort job lock acquisition to TryAcquireAsync. |
| src/Foundatio.TestHarness/Jobs/SampleQueueJob.cs | Updates best-effort job lock acquisition to TryAcquireAsync. |
| src/Foundatio.Extensions.Hosting/Jobs/ScheduledJobInstance.cs | Updates best-effort scheduled job locks to TryAcquireAsync. |
| docs/guide/locks.md | Documents the dual API (TryAcquireAsync vs throwing AcquireAsync) and updates examples accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rivate core
Both shapes are now interface methods on ILockProvider:
- AcquireAsync returns Task<ILock> and throws LockAcquisitionTimeoutException on
failure. Source-compatible at the call site (return type narrows from ILock?
to ILock); the footgun goes away because there is no null branch to forget.
- TryAcquireAsync returns Task<ILock?> for callers who want best-effort.
Each implementation (CacheLockProvider, ThrottlingLockProvider) routes both
public methods through a private TryAcquireCoreAsync, so the best-effort path
never pays an exception-as-control-flow cost. ScopedLockProvider forwards both
methods to the unscoped provider directly.
Also addresses PR review feedback:
- Document the implicit 30s default for null acquireTimeout on the convenience
overloads.
- Materialize IEnumerable<string> resources once in the throwing multi-resource
AcquireAsync overloads so single-use enumerables aren't walked twice and the
exception message matches what was actually attempted.
- Neutralize LockAcquisitionTimeoutException's default message ("Failed to
acquire lock for resource 'X'.") so it covers plain CancellationToken
cancellation and multi-resource as well as the timeout case.
- Update ThrottlingLockProvider's stale "AcquireLockAsync" trace message.
niemyjski
left a comment
There was a problem hiding this comment.
Review
The design is sound and well-motivated — the null-return footgun is a real production risk, and splitting into AcquireAsync (throws) / TryAcquireAsync (nullable) is the right fix. The implementation is clean: sharing TryAcquireCoreAsync avoids exception-as-control-flow costs, all internal callers are migrated, docs are thorough, and review feedback has been addressed well. A few concerns below.
1. Task<T> Is Invariant — Downstream Repos Will Get Compile Errors, Not Just Behavioral Changes
The PR description claims source compatibility because Task<ILock?> → Task<ILock> is a "covariant tightening." This is incorrect — Task<T> is a class and is invariant in T. Code that explicitly returns Task<ILock?> from a method typed as returning Task<ILock?> will get a compile error when calling AcquireAsync (which now returns Task<ILock>).
Concrete example in Foundatio.Repositories:
// ReindexWorkItemHandler.cs — return type is Task<ILock?>
public override Task<ILock?> GetWorkItemLockAsync(...)
{
return _lockProvider.AcquireAsync(...); // Task<ILock> is NOT assignable to Task<ILock?>
}This will be a compile error, not a silent behavioral change.
Additionally, callers that handle the null return gracefully will now throw instead:
ElasticConfiguration.cs— callsAcquireAsync(...)thenif (configLock is null)to skip gracefully → post-PR this throws instead of skippingMigrationManager.cs— callsAcquireAsync(..., TimeSpan.Zero)thenif (migrationsLock == null) return MigrationResult.UnableToAcquireLock→ dead code, now throwsCustomFieldDefinitionRepository.cs— was already throwing on null, so this one is fine (just changes exception type)
Recommendation: A coordinated migration tracking issue for downstream repos would be helpful. The PR description should also correct the "covariant tightening" claim — Task<T> invariance means this is a harder break than described.
2. Exception Hierarchy: Should LockAcquisitionTimeoutException Derive from a Base LockException?
Echoing @niemyjski's comment — other Foundatio domains have domain-specific base exceptions (CacheException, StorageException, MessageBusException). Deriving from bare Exception is inconsistent. A LockException base (even if introduced in this PR) would let consumers write catch (LockException) to handle all lock-related failures uniformly.
3. Section Divider Comments
The PR adds block comments like:
// ------------------------------------------------------------------
// AcquireAsync — single resource. Throws LockAcquisitionTimeoutException
// ------------------------------------------------------------------Per AGENTS.md: "No code comments unless necessary — code should be self-explanatory." The XML doc summaries on each method already convey the same information. These section headers are arguably noise that could drift out of sync. Consider removing them and relying on the XML docs alone.
|
This silent failure burned as well, so I'm happy to see this change. Looks good to me. |
LockAcquisitionTimeoutException now derives from a new LockException base, mirroring CacheException/StorageException so consumers can catch all lock failures uniformly. The block-comment section dividers in ILockProvider.cs are removed in favor of the existing XML doc summaries.
Both AcquireAsync and TryAcquireAsync treat an already-cancelled cancellationToken as an immediate-deadline attempt (single acquire, throw or null on failure). This is the same behavior callers already get implicitly via acquireTimeout: TimeSpan.Zero in the convenience overloads.
Why
ILockProvider.AcquireAsyncpreviously returnedTask<ILock?>—nullwhen the lock couldn't be obtained. Callers who forgot the null check silently ran their protected work without holding the lock, the exact scenario the lock was meant to prevent. We hit this in a downstream caller (an account-number generation lock) where a forgotten null check would have allowed duplicate numbers under contention.The fix is to make the dangerous shape impossible to write by accident: failing to acquire should not produce a value of type
ILock.What
ILockProvidernow exposes two acquisition shapes as interface methods, sharing a private core in each implementation so neither path pays an exception-as-control-flow cost:AcquireAsyncTask<ILock>— throwsLockAcquisitionTimeoutExceptionon failureTryAcquireAsyncTask<ILock?>—nullon failureCacheLockProviderandThrottlingLockProviderroute both public methods through a privateTryAcquireCoreAsync.ScopedLockProviderforwards each shape to the corresponding method on the unscoped provider.LockAcquisitionTimeoutExceptionderives from a newLockExceptionbase (mirroringCacheException/StorageException) so consumers cancatch (LockException)to handle all lock-related failures uniformly.Source / behavior compatibility
This is a breaking change for callers — both at the source level and at the behavior level. Three categories to be aware of:
1. Code that calls
AcquireAsyncand assigns tovar/ILock/ILock?— recompiles fine. The only signature shift is the return type going fromTask<ILock?>toTask<ILock>. Existing null checks become dead code (compiler warns "always non-null"); switch them totry/catch (LockAcquisitionTimeoutException)if you want to handle the failure, or move toTryAcquireAsyncif null-on-failure is what you want.2. Code that returns
AcquireAsync's result from a method typedTask<ILock?>— gets a compile error.Task<T>is a class and is invariant inT, soTask<ILock>is not assignable toTask<ILock?>. Concrete known case:Foundatio.Repositories'sReindexWorkItemHandler.GetWorkItemLockAsync. Fix is either to retype the method asTask<ILock>(and let the throw propagate) or switch the call site toTryAcquireAsync.3. Code that handled the null return as graceful skip — silently changes from "skip the work" to "throw
LockAcquisitionTimeoutException". Known examples inFoundatio.Repositories:ElasticConfigurationandMigrationManagerboth haveif (lock is null) return …paths. These need to switch toTryAcquireAsyncto preserve the existing skip behavior.The internal best-effort callers in this repo (
ScheduledJobInstance,WithLockingJob,ThrottledJob,SampleQueueJob,QueueTestBase) have already been migrated toTryAcquireAsyncas part of this PR.External Foundatio repos that ship a provider (
Foundatio.Redis,Foundatio.AWS, …) consumeCacheLockProviderrather than implementingILockProviderdirectly, so the new interface methods don't require changes there.Cancelled-token contract
Both
AcquireAsyncandTryAcquireAsynctreat an already-cancelledcancellationTokenas an immediate-deadline attempt: a single acquire is tried, and on failureAcquireAsyncthrowsLockAcquisitionTimeoutExceptionwhileTryAcquireAsyncreturnsnull. This preserves the existingacquireTimeout: TimeSpan.Zerotry-once-no-wait pattern (used byTryUsingAsync(..., TimeSpan.Zero)inLockTestBase.DoLockedWorkAsync).Other changes
LockExceptionbase type (inFoundatio.Lock), andLockAcquisitionTimeoutExceptionderived from it.Failed to acquire lock for resource 'X'.) so it covers cancellation and multi-resource cases as well as plain timeouts.AcquireAsyncoverloads materializeresourcesonce so single-use enumerables aren't walked twice and the exception message reflects the actual attempt.acquireTimeoutapplies a 30-second default (pre-existing behavior, now documented).docs/guide/locks.mdto document the dual API and recommendAcquireAsyncas the safer default.LockTestBasetests covering the throwing shape (single resource, cancelled token, multi-resource).Test plan
dotnet build Foundatio.slnx— cleandotnet test Foundatio.slnx— 1861 passed, 13 skipped (pre-existing skips), 0 failedFoundatio.Repositoriescallers (ReindexWorkItemHandler,ElasticConfiguration,MigrationManager,CustomFieldDefinitionRepository)