Skip to content

Make AcquireAsync throw on failure to fix the silent-null lock footgun#508

Open
ejsmith wants to merge 5 commits intomainfrom
feat/acquire-throw-on-failure
Open

Make AcquireAsync throw on failure to fix the silent-null lock footgun#508
ejsmith wants to merge 5 commits intomainfrom
feat/acquire-throw-on-failure

Conversation

@ejsmith
Copy link
Copy Markdown
Contributor

@ejsmith ejsmith commented Apr 29, 2026

Why

ILockProvider.AcquireAsync previously returned Task<ILock?>null when the lock couldn't be obtained. Callers who forgot the null check silently ran their protected work without holding the lock, the exact scenario the lock was meant to prevent. We hit this in a downstream caller (an account-number generation lock) where a forgotten null check would have allowed duplicate numbers under contention.

The fix is to make the dangerous shape impossible to write by accident: failing to acquire should not produce a value of type ILock.

What

ILockProvider now exposes two acquisition shapes as interface methods, sharing a private core in each implementation so neither path pays an exception-as-control-flow cost:

API Returns Use when
AcquireAsync Task<ILock> — throws LockAcquisitionTimeoutException on failure The work cannot run safely without the lock (the safer default).
TryAcquireAsync Task<ILock?>null on failure Lock unavailability is normal control flow (best-effort dedupe, opportunistic work).

CacheLockProvider and ThrottlingLockProvider route both public methods through a private TryAcquireCoreAsync. ScopedLockProvider forwards each shape to the corresponding method on the unscoped provider.

LockAcquisitionTimeoutException derives from a new LockException base (mirroring CacheException / StorageException) so consumers can catch (LockException) to handle all lock-related failures uniformly.

Source / behavior compatibility

This is a breaking change for callers — both at the source level and at the behavior level. Three categories to be aware of:

1. Code that calls AcquireAsync and assigns to var / ILock / ILock? — recompiles fine. The only signature shift is the return type going from Task<ILock?> to Task<ILock>. Existing null checks become dead code (compiler warns "always non-null"); switch them to try/catch (LockAcquisitionTimeoutException) if you want to handle the failure, or move to TryAcquireAsync if null-on-failure is what you want.

2. Code that returns AcquireAsync's result from a method typed Task<ILock?> — gets a compile error. Task<T> is a class and is invariant in T, so Task<ILock> is not assignable to Task<ILock?>. Concrete known case: Foundatio.Repositories's ReindexWorkItemHandler.GetWorkItemLockAsync. Fix is either to retype the method as Task<ILock> (and let the throw propagate) or switch the call site to TryAcquireAsync.

3. Code that handled the null return as graceful skip — silently changes from "skip the work" to "throw LockAcquisitionTimeoutException". Known examples in Foundatio.Repositories: ElasticConfiguration and MigrationManager both have if (lock is null) return … paths. These need to switch to TryAcquireAsync to preserve the existing skip behavior.

The internal best-effort callers in this repo (ScheduledJobInstance, WithLockingJob, ThrottledJob, SampleQueueJob, QueueTestBase) have already been migrated to TryAcquireAsync as part of this PR.

External Foundatio repos that ship a provider (Foundatio.Redis, Foundatio.AWS, …) consume CacheLockProvider rather than implementing ILockProvider directly, so the new interface methods don't require changes there.

Cancelled-token contract

Both AcquireAsync and TryAcquireAsync treat an already-cancelled cancellationToken as an immediate-deadline attempt: a single acquire is tried, and on failure AcquireAsync throws LockAcquisitionTimeoutException while TryAcquireAsync returns null. This preserves the existing acquireTimeout: TimeSpan.Zero try-once-no-wait pattern (used by TryUsingAsync(..., TimeSpan.Zero) in LockTestBase.DoLockedWorkAsync).

Other changes

  • New LockException base type (in Foundatio.Lock), and LockAcquisitionTimeoutException derived from it.
  • The default exception message is neutral (Failed to acquire lock for resource 'X'.) so it covers cancellation and multi-resource cases as well as plain timeouts.
  • Multi-resource throwing AcquireAsync overloads materialize resources once so single-use enumerables aren't walked twice and the exception message reflects the actual attempt.
  • Convenience overloads' XML docs explicitly note that a null acquireTimeout applies a 30-second default (pre-existing behavior, now documented).
  • Updated docs/guide/locks.md to document the dual API and recommend AcquireAsync as the safer default.
  • New LockTestBase tests covering the throwing shape (single resource, cancelled token, multi-resource).

Test plan

  • dotnet build Foundatio.slnx — clean
  • dotnet test Foundatio.slnx — 1861 passed, 13 skipped (pre-existing skips), 0 failed
  • Verify CI passes
  • Follow-up: migrate downstream Foundatio.Repositories callers (ReindexWorkItemHandler, ElasticConfiguration, MigrationManager, CustomFieldDefinitionRepository)

BREAKING CHANGE: Renames the interface method `ILockProvider.AcquireAsync`
to `TryAcquireAsync` (still returns `ILock?`) and introduces new
`AcquireAsync` extension methods that throw `LockAcquisitionTimeoutException`
when the lock cannot be obtained.

Motivation: callers that forgot the `null` check could silently run work
without holding the lock — the very scenario the lock was meant to prevent.
The previous single shape made the dangerous-path easy and the safe-path
require ceremony. The new design separates the two intents:

- `TryAcquireAsync` — returns `ILock?`. Use when lock unavailability is a
  normal control-flow outcome (best-effort dedupe, opportunistic work).
- `AcquireAsync` — returns `ILock`, throws `LockAcquisitionTimeoutException`
  on failure. Use when the work cannot run safely without the lock.

Internal callers (jobs, queue tests, scheduled jobs) move to
`TryAcquireAsync` to preserve their existing best-effort semantics. Only
implementation in this repo: `CacheLockProvider`, `ScopedLockProvider`,
`ThrottlingLockProvider`. Callers in other Foundatio repos must rename
their `AcquireAsync` overrides/calls to `TryAcquireAsync`.

Adds three new test cases in `LockTestBase` exercising the throw shape
for single resource, cancelled token, and multi-resource paths. Updates
`docs/guide/locks.md` to document the dual API and pick-the-right-method
guidance.
Comment thread docs/guide/locks.md
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Splits lock acquisition into an explicit Try-pattern (TryAcquireAsync returning ILock?) and a throwing pattern (AcquireAsync extension methods returning ILock and throwing LockAcquisitionTimeoutException), and migrates callers/tests/docs to the new API to avoid accidental “no-lock” execution.

Changes:

  • Renamed ILockProvider.AcquireAsyncTryAcquireAsync across providers and call sites.
  • Added AcquireAsync extension methods (single + multi-resource) that throw LockAcquisitionTimeoutException on failure.
  • Updated tests and documentation to reflect the dual acquisition shapes and recommended usage patterns.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/Foundatio.Tests/Utility/ResiliencePolicyTests.cs Updates lock acquisition call to TryAcquireAsync.
tests/Foundatio.Tests/Locks/InMemoryLockTests.cs Adds [Fact] overrides to run new throw-shape base tests for the in-memory provider.
src/Foundatio/Lock/ThrottlingLockProvider.cs Renames provider implementation method to TryAcquireAsync.
src/Foundatio/Lock/ScopedLockProvider.cs Renames provider implementation method and forwards to TryAcquireAsync.
src/Foundatio/Lock/LockAcquisitionTimeoutException.cs Introduces new exception used by throwing acquisition APIs.
src/Foundatio/Lock/ILockProvider.cs Renames interface method and adds TryAcquireAsync/AcquireAsync extension overloads + updates TryUsingAsync internals.
src/Foundatio/Lock/CacheLockProvider.cs Renames provider implementation method to TryAcquireAsync.
src/Foundatio.TestHarness/Queue/QueueTestBase.cs Migrates distributed-lock tests to the throwing AcquireAsync where success is required.
src/Foundatio.TestHarness/Locks/LockTestBase.cs Migrates base tests to TryAcquireAsync and adds new base tests for the throwing AcquireAsync shape.
src/Foundatio.TestHarness/Jobs/WithLockingJob.cs Updates best-effort job lock acquisition to TryAcquireAsync.
src/Foundatio.TestHarness/Jobs/ThrottledJob.cs Updates best-effort job lock acquisition to TryAcquireAsync.
src/Foundatio.TestHarness/Jobs/SampleQueueJob.cs Updates best-effort job lock acquisition to TryAcquireAsync.
src/Foundatio.Extensions.Hosting/Jobs/ScheduledJobInstance.cs Updates best-effort scheduled job locks to TryAcquireAsync.
docs/guide/locks.md Documents the dual API (TryAcquireAsync vs throwing AcquireAsync) and updates examples accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Foundatio/Lock/ILockProvider.cs Outdated
Comment thread src/Foundatio/Lock/ILockProvider.cs Outdated
Comment thread src/Foundatio/Lock/ILockProvider.cs
Comment thread src/Foundatio/Lock/ThrottlingLockProvider.cs Outdated
Comment thread src/Foundatio/Lock/LockAcquisitionTimeoutException.cs
…rivate core

Both shapes are now interface methods on ILockProvider:
- AcquireAsync returns Task<ILock> and throws LockAcquisitionTimeoutException on
  failure. Source-compatible at the call site (return type narrows from ILock?
  to ILock); the footgun goes away because there is no null branch to forget.
- TryAcquireAsync returns Task<ILock?> for callers who want best-effort.

Each implementation (CacheLockProvider, ThrottlingLockProvider) routes both
public methods through a private TryAcquireCoreAsync, so the best-effort path
never pays an exception-as-control-flow cost. ScopedLockProvider forwards both
methods to the unscoped provider directly.

Also addresses PR review feedback:
- Document the implicit 30s default for null acquireTimeout on the convenience
  overloads.
- Materialize IEnumerable<string> resources once in the throwing multi-resource
  AcquireAsync overloads so single-use enumerables aren't walked twice and the
  exception message matches what was actually attempted.
- Neutralize LockAcquisitionTimeoutException's default message ("Failed to
  acquire lock for resource 'X'.") so it covers plain CancellationToken
  cancellation and multi-resource as well as the timeout case.
- Update ThrottlingLockProvider's stale "AcquireLockAsync" trace message.
@ejsmith ejsmith changed the title Split ILockProvider acquisition into Try/throw shapes Make AcquireAsync throw on failure to fix the silent-null lock footgun Apr 29, 2026
Copy link
Copy Markdown
Member

@niemyjski niemyjski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The design is sound and well-motivated — the null-return footgun is a real production risk, and splitting into AcquireAsync (throws) / TryAcquireAsync (nullable) is the right fix. The implementation is clean: sharing TryAcquireCoreAsync avoids exception-as-control-flow costs, all internal callers are migrated, docs are thorough, and review feedback has been addressed well. A few concerns below.


1. Task<T> Is Invariant — Downstream Repos Will Get Compile Errors, Not Just Behavioral Changes

The PR description claims source compatibility because Task<ILock?>Task<ILock> is a "covariant tightening." This is incorrectTask<T> is a class and is invariant in T. Code that explicitly returns Task<ILock?> from a method typed as returning Task<ILock?> will get a compile error when calling AcquireAsync (which now returns Task<ILock>).

Concrete example in Foundatio.Repositories:

// ReindexWorkItemHandler.cs — return type is Task<ILock?>
public override Task<ILock?> GetWorkItemLockAsync(...)
{
    return _lockProvider.AcquireAsync(...); // Task<ILock> is NOT assignable to Task<ILock?>
}

This will be a compile error, not a silent behavioral change.

Additionally, callers that handle the null return gracefully will now throw instead:

  • ElasticConfiguration.cs — calls AcquireAsync(...) then if (configLock is null) to skip gracefully → post-PR this throws instead of skipping
  • MigrationManager.cs — calls AcquireAsync(..., TimeSpan.Zero) then if (migrationsLock == null) return MigrationResult.UnableToAcquireLock → dead code, now throws
  • CustomFieldDefinitionRepository.cs — was already throwing on null, so this one is fine (just changes exception type)

Recommendation: A coordinated migration tracking issue for downstream repos would be helpful. The PR description should also correct the "covariant tightening" claim — Task<T> invariance means this is a harder break than described.


2. Exception Hierarchy: Should LockAcquisitionTimeoutException Derive from a Base LockException?

Echoing @niemyjski's comment — other Foundatio domains have domain-specific base exceptions (CacheException, StorageException, MessageBusException). Deriving from bare Exception is inconsistent. A LockException base (even if introduced in this PR) would let consumers write catch (LockException) to handle all lock-related failures uniformly.


3. Section Divider Comments

The PR adds block comments like:

// ------------------------------------------------------------------
// AcquireAsync — single resource. Throws LockAcquisitionTimeoutException
// ------------------------------------------------------------------

Per AGENTS.md: "No code comments unless necessary — code should be self-explanatory." The XML doc summaries on each method already convey the same information. These section headers are arguably noise that could drift out of sync. Consider removing them and relying on the XML docs alone.

Comment thread src/Foundatio/Lock/ILockProvider.cs Outdated
@randylsu
Copy link
Copy Markdown

This silent failure burned as well, so I'm happy to see this change. Looks good to me.

ejsmith added 2 commits May 4, 2026 17:26
LockAcquisitionTimeoutException now derives from a new LockException
base, mirroring CacheException/StorageException so consumers can catch
all lock failures uniformly. The block-comment section dividers in
ILockProvider.cs are removed in favor of the existing XML doc summaries.
Both AcquireAsync and TryAcquireAsync treat an already-cancelled
cancellationToken as an immediate-deadline attempt (single acquire,
throw or null on failure). This is the same behavior callers already
get implicitly via acquireTimeout: TimeSpan.Zero in the convenience
overloads.
Comment thread src/Foundatio/Lock/ThrottlingLockProvider.cs Outdated
Comment thread src/Foundatio/Lock/CacheLockProvider.cs Outdated
Comment thread src/Foundatio/Lock/ILockProvider.cs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants