switchrpc: add DeleteAttempts RPC#10602
switchrpc: add DeleteAttempts RPC#10602calvinrzachman wants to merge 5 commits intolightningnetwork:elle-base-branch-payment-servicefrom
Conversation
Summary of ChangesHello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial new RPC, Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable DeleteAttempts RPC to switchrpc for garbage collecting terminal HTLC attempts, which is a great feature for external routers. The accompanying overhaul of the SendOnion RPC to provide more robust, structured error handling is also a significant improvement. The code is well-structured and includes thorough testing for the new functionality.
My main concern is an inconsistency in the DeleteAttempts API regarding the DELETION_ALREADY_DELETED status. The API documentation and enums define this status, but the underlying implementation currently returns DELETION_NOT_FOUND for already-deleted attempts. This could lead to confusion for clients. I've provided detailed comments with suggestions on how to address this.
htlcswitch/payment_result.go
Outdated
| // DeletionAlreadyDeleted indicates the attempt was already deleted | ||
| // in a prior call. This is normal idempotent retry behavior and | ||
| // requires no corrective action from the client. | ||
| DeletionAlreadyDeleted |
There was a problem hiding this comment.
The DeletionAlreadyDeleted status is defined here and in the corresponding protobuf enum, but it's never used in the DeleteAttempts implementation. Currently, deleting an already-deleted or non-existent attempt both result in DeletionNotFound. This can be confusing for clients, as DeletionNotFound is documented to "indicate a client bug", while an idempotent re-delete is a normal, expected operation.
Given that tombstone-based deletion is planned for the future (which would enable distinguishing these cases), I suggest one of two options for this PR to improve API consistency:
- Remove the
DeletionAlreadyDeletedstatus from both thehtlcswitch.DeletionStatusandswitchrpc.AttemptDeletionStatusenums for now. This would make the API reflect the current implementation. - If distinguishing these cases is important for the initial API, consider a way to implement this status even without full tombstones.
If neither is desirable, at a minimum, the comment for DeletionNotFound should be updated to clarify that it can also mean "already deleted", and the test comment in payment_result_test.go should be corrected to reflect that DeletionNotFound is expected on re-delete.
There was a problem hiding this comment.
This is a valid point, right? The proto then also needs an update
d460d84 to
27c2b33
Compare
bitromortac
left a comment
There was a problem hiding this comment.
The concept looks good to me 👍
| for _, id := range attemptIDs { | ||
| store.attemptIDMtx.Lock(id) | ||
| } |
There was a problem hiding this comment.
Could there be deadlocking if this is called by another caller? It seems like sorting the attempt ids could help
There was a problem hiding this comment.
Nice catch. Updated so that we sort the IDs prior to grabbing the locks.
| require.True(t, ok, "expected temporary node failure") | ||
|
|
||
| // Test subscriber notification after FailPendingAttempt. | ||
| t.Run("FailPendingAttempt notifies subscriber", func(t *testing.T) { |
There was a problem hiding this comment.
This is unrelated, right? If so it should be in its own commit
There was a problem hiding this comment.
Yeah, should have had it in a prior PR. Moved to a separate commit.
htlcswitch/payment_result.go
Outdated
| "for attempt %d during delete: %v", | ||
| id, err) | ||
|
|
||
| results[id] = DeletionNotFound |
There was a problem hiding this comment.
It may be useful to add a DeletionFailed status as well, otherwise we could accumulate wrong messages?
htlcswitch/payment_result.go
Outdated
| // DeletionAlreadyDeleted indicates the attempt was already deleted | ||
| // in a prior call. This is normal idempotent retry behavior and | ||
| // requires no corrective action from the client. | ||
| DeletionAlreadyDeleted |
There was a problem hiding this comment.
This is a valid point, right? The proto then also needs an update
Add a "positive space" deletion primitive for the network result store. Unlike CleanStore (which deletes everything except a keep-set and is vulnerable to stale-snapshot races), DeleteAttempts accepts an explicit list of attempt IDs to remove. The implementation acquires per-ID mutexes, checks each record's state within a single kvdb transaction, and only deletes terminal (settled or failed) results. Pending attempts are protected by a server-side guard. Per-ID results are returned so callers get actionable feedback: OK, Pending, Failed, or NotFound.
Expose the store-layer DeleteAttempts as a gRPC endpoint on the Switch sub-server, allowing remote callers to garbage-collect terminal attempt records by explicitly naming the finished IDs. A unit test with a stateful mock demonstrates that hard delete removes the idempotency protection, allowing re-dispatch with the same ID.
Extend testSendOnionTwice to call DeleteAttempts after the payment settles, verifying that removing a terminal attempt returns DELETION_OK, and a subsequent delete of the same ID returns DELETION_NOT_FOUND.
Verify that SubscribeResult is notified after FailPendingAttempt transitions a pending attempt to a terminal failure state.
27c2b33 to
53fa0ca
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (2 files)
🟠 High (2 files)
🟡 Medium (3 files)
🟢 Low (1 file)
AnalysisThis PR is classified as CRITICAL because it modifies files within
Changes to Severity bump check: 8 non-test/non-generated files (threshold: 20), 436 lines changed (threshold: 500), single critical package — no bump applied. To override, add a |
|
I'll fix the lint job complaining about unnecessary use of |
Change Description
Add a
DeleteAttemptsRPC to theswitchrpcsub-server, enabling remote clients to perform safe, online garbage collection of terminal HTLC attempt records from theAttemptStore.This uses a more explicit approach to deletion: the client names the specific finished attempts to delete. Unlike the existing
CleanStore(which deletes everything except a keep-set),DeleteAttemptsis immune to stale-snapshot raceconditions and does not require a disruptive quiescent state — making it suitable for use when a remote router manages payments.
If a particular request to delete a batch of attempts fails entirely or partially, the client can just try again later. They do not need to worry about conducting the deletion in parallel with processing htlc attempts for new payment requests.
OK,PENDING,FAILEDNOT_FOUND) in the response.ChannelRouter's cleanup logic is not modified. This is a new primitive exposed only viaswitchrpc. An RPC client looking to use this endpoint will have to track the appropriate state necessary to determine which HTLC attempts have been completed (settled/failed) but which are yet to be deleted.Addresses: #10519
Replaces (favored over): #10484
Future Addition