Skip to content

switchrpc: add DeleteAttempts RPC#10602

Open
calvinrzachman wants to merge 5 commits intolightningnetwork:elle-base-branch-payment-servicefrom
calvinrzachman:switchrpc-delete-attempts
Open

switchrpc: add DeleteAttempts RPC#10602
calvinrzachman wants to merge 5 commits intolightningnetwork:elle-base-branch-payment-servicefrom
calvinrzachman:switchrpc-delete-attempts

Conversation

@calvinrzachman
Copy link
Collaborator

@calvinrzachman calvinrzachman commented Feb 23, 2026

Change Description

Add a DeleteAttempts RPC to the switchrpc sub-server, enabling remote clients to perform safe, online garbage collection of terminal HTLC attempt records from the AttemptStore.

This uses a more explicit approach to deletion: the client names the specific finished attempts to delete. Unlike the existing CleanStore (which deletes everything except a keep-set), DeleteAttempts is immune to stale-snapshot race
conditions and does not require a disruptive quiescent state — making it suitable for use when a remote router manages payments.

If a particular request to delete a batch of attempts fails entirely or partially, the client can just try again later. They do not need to worry about conducting the deletion in parallel with processing htlc attempts for new payment requests.

  • Each attempt ID gets an independent status result (OK, PENDING, FAILED NOT_FOUND) in the response.
  • The server refuses to delete in-flight attempts, even if the client requests it.
  • The internal ChannelRouter's cleanup logic is not modified. This is a new primitive exposed only via switchrpc. An RPC client looking to use this endpoint will have to track the appropriate state necessary to determine which HTLC attempts have been completed (settled/failed) but which are yet to be deleted.

Addresses: #10519
Replaces (favored over): #10484

Future Addition

  • Tombstone-based deletion: Deleted records can be replaced with a lightweight sentinel record that preserves idempotency protection (preventing duplicate HTLC dispatch from stale requests). Tombstones can be swept automatically on server restart. This is an extra layer of protection against client mis-use.

@gemini-code-assist
Copy link

Summary of Changes

Hello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial new RPC, DeleteAttempts, to the switchrpc interface, empowering external routing clients to perform explicit and safe garbage collection of completed HTLC attempt records. This enhancement addresses the challenges of managing payment state in distributed systems by providing a more robust and race-condition-resistant method for cleaning up terminal payment attempts, contrasting with the less explicit CleanStore approach. It also builds upon recent changes to the SendOnion RPC, which now provides richer, structured error details, ensuring a more reliable and predictable interaction model for clients managing payment lifecycles.

Highlights

  • New RPC: DeleteAttempts: A new DeleteAttempts RPC has been added to the switchrpc sub-server. This RPC allows remote clients to explicitly delete terminal (settled or failed) HTLC attempt records from the AttemptStore.
  • Safe Garbage Collection: This new deletion mechanism provides a safer, 'positive space' alternative to the existing CleanStore method. It is immune to stale-snapshot race conditions and does not require a disruptive quiescent state, making it suitable for distributed systems.
  • Granular Deletion Status: The DeleteAttempts RPC returns per-ID results for each requested attempt, indicating its deletion status (e.g., OK, PENDING, NOT_FOUND, ALREADY_DELETED). In-flight attempts are explicitly refused deletion.
  • SendOnion RPC Overhaul (Context): The SendOnion RPC (overhauled in a prior PR) now uses gRPC status codes for outcomes. Failures include structured SendOnionFailureDetails to classify them as DefiniteFailure (safe to fail and retry with new route) or IndefiniteFailure (client MUST retry same request).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/release-notes/release-notes-0.21.0.md
    • Documented the overhaul of the switchrpc.SendOnion RPC, detailing its new client-friendly API and breaking changes.
    • Added an entry for the new switchrpc.DeleteAttempts RPC, explaining its purpose for cleaning up terminal attempt records and its per-ID result reporting.
  • htlcswitch/interfaces.go
    • Added the DeleteAttempts method to the AttemptStore interface, allowing for explicit removal of terminal attempt results.
  • htlcswitch/payment_result.go
    • Defined a new DeletionStatus enum to represent the outcome of an attempt deletion.
    • Implemented the DeleteAttempts method within networkResultStore, including logic for handling pending attempts, duplicate IDs, and database interactions.
  • htlcswitch/payment_result_test.go
    • Added a test case to verify subscriber notification after FailPendingAttempt.
    • Introduced comprehensive tests for the DeleteAttempts method, covering various scenarios like empty ID lists, successful deletions, refusal to delete pending attempts, duplicate ID rejection, unknown IDs, mixed batches, idempotent re-deletion, and re-initialization after deletion.
  • itest/lnd_sendonion_test.go
    • Removed unused context and rpc imports.
    • Updated alice.RPC.SendOnion calls to reflect the new API, which now returns an error directly instead of a SendOnionResponse struct.
    • Added a test case within testSendOnionTwice to demonstrate the usage and functionality of the new DeleteAttempts RPC after a settled HTLC.
  • lnrpc/switchrpc/mock.go
    • Added sync import.
    • Extended mockAttemptStore with deleteErr and deleteResp fields to mock DeleteAttempts behavior.
    • Introduced statefulAttemptStore to simulate idempotency and deletion behavior more realistically for testing purposes.
  • lnrpc/switchrpc/switch.pb.go
    • Replaced ErrorCode enum with IndefiniteFailureCode and AttemptDeletionStatus enums.
    • Modified SendOnionResponse to be an empty message, aligning with the new gRPC status-based error handling.
    • Added new message types: SendOnionFailureDetails, DefiniteFailure, IndefiniteFailure, DeleteAttemptsRequest, DeleteAttemptsResponse, and AttemptDeletionResult to support the new RPC and detailed error reporting.
    • Updated proto message and enum type registrations to reflect the new definitions.
  • lnrpc/switchrpc/switch.pb.gw.go
    • Added gRPC-Gateway request and local request functions for the new DeleteAttempts RPC.
    • Registered DeleteAttempts handler with the gRPC-Gateway server and client.
  • lnrpc/switchrpc/switch.proto
    • Updated the description for the SendOnion RPC to clarify the new gRPC status-based outcome handling and the inclusion of SendOnionFailureDetails.
    • Defined the new DeleteAttempts RPC, including its request and response messages (DeleteAttemptsRequest, DeleteAttemptsResponse).
    • Added new message types: SendOnionFailureDetails, DefiniteFailure, IndefiniteFailure, and their associated enums (IndefiniteFailureCode, AttemptDeletionStatus) for richer error reporting and deletion outcomes.
  • lnrpc/switchrpc/switch.swagger.json
    • Added Swagger definitions for the new /v2/switch/deleteattempts endpoint, including its summary, description, and request/response schemas.
    • Updated the Swagger description for /v2/switch/onion to reflect the new error handling and SendOnionFailureDetails.
    • Added new schema definitions for switchrpcAttemptDeletionResult, switchrpcAttemptDeletionStatus, switchrpcDeleteAttemptsRequest, switchrpcDeleteAttemptsResponse, and removed switchrpcErrorCode.
  • lnrpc/switchrpc/switch.yaml
    • Added a new HTTP rule for the DeleteAttempts RPC, mapping it to a POST request at /v2/switch/deleteattempts.
  • lnrpc/switchrpc/switch_grpc.pb.go
    • Updated the SendOnion RPC signature in the SwitchClient and SwitchServer interfaces to reflect the new empty SendOnionResponse and gRPC status-based error handling.
    • Added the DeleteAttempts RPC to the SwitchClient and SwitchServer interfaces.
    • Implemented the DeleteAttempts method in switchClient and UnimplementedSwitchServer.
    • Registered the DeleteAttempts handler in Switch_ServiceDesc.
  • lnrpc/switchrpc/switch_server.go
    • Removed unused encoding/hex import.
    • Added DeleteAttempts RPC to the macaroonPermissions map with offchain:write permissions.
    • Updated the SendOnion RPC implementation to use the new marshallDispatchFailure function for error handling, providing structured SendOnionFailureDetails.
    • Implemented the DeleteAttempts RPC, including input validation, calling the AttemptStore, and converting internal deletion statuses to proto responses.
    • Introduced deletionStatusToProto helper for converting internal deletion statuses.
    • Added marshallDispatchFailure function to translate internal HTLC switch errors into gRPC status errors with rich details.
    • Added GetSendOnionFailureDetails helper to extract structured failure details from gRPC errors.
  • lnrpc/switchrpc/switch_server_test.go
    • Removed unused encoding/hex import.
    • Updated TestSendOnion to reflect the new SendOnion RPC API, including checking for gRPC status codes and using checkFailureDetails for structured error validation.
    • Added requireSendOnionFailureDetails helper for asserting SendOnion failure details in tests.
    • Introduced TestDeleteAttempts to verify the DeleteAttempts RPC handler's behavior, including mixed results, empty requests, duplicate IDs, and store errors.
    • Added TestDeletionStatusToProto to test the conversion of internal deletion statuses to proto enums.
    • Added TestSendOnionAfterDelete to demonstrate that deleting a terminal attempt record removes idempotency protection.
    • Updated TestMarshallDispatchFailure to test the new error marshalling logic for SendOnion.
  • lntest/rpc/switch.go
    • Modified the SendOnion helper to return an error directly, aligning with the updated RPC signature.
    • Added a new DeleteAttempts helper function for making RPC calls to the DeleteAttempts endpoint in integration tests.
Activity
  • The pull request introduces a new RPC, DeleteAttempts, and refactors error handling for SendOnion. No specific human activity like comments or reviews are provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable DeleteAttempts RPC to switchrpc for garbage collecting terminal HTLC attempts, which is a great feature for external routers. The accompanying overhaul of the SendOnion RPC to provide more robust, structured error handling is also a significant improvement. The code is well-structured and includes thorough testing for the new functionality.

My main concern is an inconsistency in the DeleteAttempts API regarding the DELETION_ALREADY_DELETED status. The API documentation and enums define this status, but the underlying implementation currently returns DELETION_NOT_FOUND for already-deleted attempts. This could lead to confusion for clients. I've provided detailed comments with suggestions on how to address this.

Comment on lines 63 to 66
// DeletionAlreadyDeleted indicates the attempt was already deleted
// in a prior call. This is normal idempotent retry behavior and
// requires no corrective action from the client.
DeletionAlreadyDeleted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The DeletionAlreadyDeleted status is defined here and in the corresponding protobuf enum, but it's never used in the DeleteAttempts implementation. Currently, deleting an already-deleted or non-existent attempt both result in DeletionNotFound. This can be confusing for clients, as DeletionNotFound is documented to "indicate a client bug", while an idempotent re-delete is a normal, expected operation.

Given that tombstone-based deletion is planned for the future (which would enable distinguishing these cases), I suggest one of two options for this PR to improve API consistency:

  1. Remove the DeletionAlreadyDeleted status from both the htlcswitch.DeletionStatus and switchrpc.AttemptDeletionStatus enums for now. This would make the API reflect the current implementation.
  2. If distinguishing these cases is important for the initial API, consider a way to implement this status even without full tombstones.

If neither is desirable, at a minimum, the comment for DeletionNotFound should be updated to clarify that it can also mean "already deleted", and the test comment in payment_result_test.go should be corrected to reflect that DeletionNotFound is expected on re-delete.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid point, right? The proto then also needs an update

@saubyk saubyk added this to v0.21 Feb 23, 2026
@saubyk saubyk added this to the v0.21.0 milestone Feb 23, 2026
@saubyk saubyk moved this to In progress in v0.21 Feb 23, 2026
@calvinrzachman calvinrzachman force-pushed the switchrpc-delete-attempts branch from d460d84 to 27c2b33 Compare February 24, 2026 14:10
Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept looks good to me 👍

Comment on lines +516 to +518
for _, id := range attemptIDs {
store.attemptIDMtx.Lock(id)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be deadlocking if this is called by another caller? It seems like sorting the attempt ids could help

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Updated so that we sort the IDs prior to grabbing the locks.

require.True(t, ok, "expected temporary node failure")

// Test subscriber notification after FailPendingAttempt.
t.Run("FailPendingAttempt notifies subscriber", func(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated, right? If so it should be in its own commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should have had it in a prior PR. Moved to a separate commit.

"for attempt %d during delete: %v",
id, err)

results[id] = DeletionNotFound
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to add a DeletionFailed status as well, otherwise we could accumulate wrong messages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ✅

Comment on lines 63 to 66
// DeletionAlreadyDeleted indicates the attempt was already deleted
// in a prior call. This is normal idempotent retry behavior and
// requires no corrective action from the client.
DeletionAlreadyDeleted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid point, right? The proto then also needs an update

Add a "positive space" deletion primitive for the network
result store. Unlike CleanStore (which deletes everything
except a keep-set and is vulnerable to stale-snapshot races),
DeleteAttempts accepts an explicit list of attempt IDs to
remove.

The implementation acquires per-ID mutexes, checks each
record's state within a single kvdb transaction, and only
deletes terminal (settled or failed) results. Pending
attempts are protected by a server-side guard.

Per-ID results are returned so callers get actionable
feedback: OK, Pending, Failed, or NotFound.
Expose the store-layer DeleteAttempts as a gRPC endpoint
on the Switch sub-server, allowing remote callers to
garbage-collect terminal attempt records by explicitly
naming the finished IDs.

A unit test with a stateful mock demonstrates that hard
delete removes the idempotency protection, allowing
re-dispatch with the same ID.
Extend testSendOnionTwice to call DeleteAttempts after
the payment settles, verifying that removing a terminal
attempt returns DELETION_OK, and a subsequent delete of
the same ID returns DELETION_NOT_FOUND.
Verify that SubscribeResult is notified after FailPendingAttempt
transitions a pending attempt to a terminal failure state.
@calvinrzachman calvinrzachman force-pushed the switchrpc-delete-attempts branch from 27c2b33 to 53fa0ca Compare February 26, 2026 18:07
@lightninglabs-deploy lightninglabs-deploy added the severity-critical Requires expert review - security/consensus critical label Feb 26, 2026
@lightninglabs-deploy
Copy link
Collaborator

🔴 PR Severity: CRITICAL

Automated classification | 8 files | 436 lines changed

🔴 Critical (2 files)
  • htlcswitch/interfaces.go - HTLC switch interface definitions; part of htlcswitch/* (HTLC forwarding, payment routing state machine)
  • htlcswitch/payment_result.go - Payment result handling within htlcswitch; core HTLC forwarding logic
🟠 High (2 files)
  • lnrpc/switchrpc/switch_server.go - RPC server implementation for switch subserver
  • lnrpc/switchrpc/mock.go - Mock for switch RPC (lnrpc/* package)
🟡 Medium (3 files)
  • lnrpc/switchrpc/switch.proto - API/proto definition change
  • lnrpc/switchrpc/switch.swagger.json - API swagger spec
  • lnrpc/switchrpc/switch.yaml - API config
🟢 Low (1 file)
  • docs/release-notes/release-notes-0.21.0.md - Release notes documentation

Analysis

This PR is classified as CRITICAL because it modifies files within htlcswitch/, which is one of the highest-risk packages in the codebase. Specifically:

  • htlcswitch/payment_result.go appears to add new payment result tracking logic (140 lines added), directly within the HTLC forwarding and payment routing state machine.
  • htlcswitch/interfaces.go extends interfaces used throughout the HTLC switch layer.

Changes to htlcswitch/ can affect payment routing correctness, fund safety, and channel state integrity. Expert review is warranted.

Severity bump check: 8 non-test/non-generated files (threshold: 20), 436 lines changed (threshold: 500), single critical package — no bump applied.


To override, add a severity-override-{critical,high,medium,low} label.

@calvinrzachman
Copy link
Collaborator Author

I'll fix the lint job complaining about unnecessary use of nolint and release notes on next push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

payments-v2 severity-critical Requires expert review - security/consensus critical

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

4 participants