sqldb+kvdb: fix SQLite transaction retry failures on mobile platforms by Roasbeef · Pull Request #10561 · lightningnetwork/lnd

Roasbeef · 2026-02-09T20:58:45Z

Problem

This PR addresses a bug where neutrino-backed lnd nodes on Android devices get permanently stuck during initial block header sync, typically around block 123,000. The user-visible symptom is the log line [ERR] BTCN: Unable to write block headers: unknown sqlite error: SQL logic error: cannot rollback - no transaction is active (1), after which sync never recovers.

Root Cause

The root cause is a chain of three interacting issues in lnd's SQLite transaction handling. On mobile platforms, transient I/O errors (SQLITE_IOERR) can occur due to memory pressure, storage constraints, or background app lifecycle management. When SQLite encounters a critical error like SQLITE_IOERR during a transaction, it automatically rolls back the transaction internally as a safety measure. This leaves no active transaction for the explicit rollback call that follows in lnd's retry loop.

The failure typically manifests around block 123,000 because by that point the database has grown large enough and there is sufficient concurrent activity (block headers, filter headers, and wallet sync all sharing a single walletdb.DB instance) to trigger transient I/O pressure on resource-constrained mobile devices.

Fix 1: Handle SQLite Auto-Rollback

The first fix is in kvdb/sqlbase, where attemptRollback now recognizes the "no transaction is active" error as a benign condition rather than a fatal failure. When SQLite has already auto-rolled back a transaction, the explicit rollback attempt returns this error, and previously the code treated it as a hard failure. Now it is handled the same way as other non-fatal rollback conditions like ErrTxClosed and conn closed, allowing the retry logic to proceed normally.

Fix 2: Preserve Original Error on Rollback Failure

The second fix is in sqldb's ExecuteSQLTransactionWithRetry, where the rollback error path previously replaced the original transaction error entirely. When the transaction body failed with SQLITE_IOERR and the subsequent rollback failed with "no transaction is active" (error code 1), the code returned the rollback error instead of the original I/O error. Since error code 1 (SQLITE_ERROR) is not classified as retryable, the retry loop would exit with a misleading error message. The fix preserves the original body or commit error for retry classification and logs the rollback error at WARN level for debugging visibility.

Fix 3: Classify Transient SQLite Errors as Retryable

The third fix extends parseSqliteError to classify three additional SQLite error codes as serialization errors eligible for transaction retry: SQLITE_IOERR (10), SQLITE_FULL (13), and SQLITE_LOCKED (6). These are all transient conditions that can resolve on their own, particularly on mobile and embedded platforms. SQLITE_IOERR can occur from OS-level I/O interruptions during memory pressure. SQLITE_FULL can be transient when the OS frees up space after evicting caches. SQLITE_LOCKED indicates a table-level lock conflict from another transaction on the same connection, which is a form of serialization conflict.

Result

Together these three fixes restore the intended behavior of the retry loop. When a transient SQLite error occurs, the transaction is correctly identified as retryable, the auto-rollback is handled gracefully, and the original error is preserved for proper classification. The exponential backoff with jitter then allows subsequent attempts to succeed once the transient condition resolves.

Testing

Each fix includes unit tests. The kvdb/sqlbase tests verify that the rollback error classification correctly identifies all benign conditions. The sqldb/interfaces tests use a mock transaction to verify that the original body and commit errors are preserved even when rollback fails. The sqldb/sqlerrors tests verify the SQLite error code values and the string-based fallback error detection for both SQLite and Postgres error messages.

gemini-code-assist · 2026-02-09T20:59:04Z

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves critical SQLite transaction retry failures observed on mobile platforms, specifically affecting neutrino-backed lnd nodes during initial block header synchronization. The changes address a complex interaction of transient I/O errors, SQLite's auto-rollback behavior, and lnd's error handling, ensuring that transient database issues are correctly identified as retryable and do not lead to permanent application stalls. This improves the robustness and reliability of lnd on mobile devices.

Highlights

Benign Rollback Handling: The attemptRollback function in kvdb/sqlbase now treats the "no transaction is active" SQLite error as a benign condition, preventing it from being treated as a fatal failure when SQLite has already auto-rolled back a transaction due to critical errors like SQLITE_IOERR.
Original Error Preservation: sqldb's ExecuteSQLTransactionWithRetry is updated to preserve the original transaction body or commit error, even if a subsequent rollback fails. This ensures that the correct root cause error is returned for retry classification, logging the rollback error at a WARN level for debugging.
Expanded Retryable Errors: The parseSqliteError function now classifies SQLITE_IOERR (I/O error), SQLITE_FULL (disk full), and SQLITE_LOCKED (database locked) as serialization errors. This makes these transient conditions eligible for transaction retry, which is crucial for stability on resource-constrained mobile platforms.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

kvdb/sqlbase/db.go
- Introduced ErrNoTxActive constant for the "no transaction is active" SQLite error string.
- Modified attemptRollback to ignore ErrNoTxActive and "conn closed" errors during rollback, treating them as non-fatal conditions.
kvdb/sqlbase/db_test.go
- Added TestAttemptRollbackNoTxActive to verify that attemptRollback correctly identifies and ignores benign rollback errors, including "no transaction is active".
sqldb/interfaces.go
- Updated ExecuteSQLTransactionWithRetry to log rollback errors at WARN level and always return the original transaction body or commit error, rather than the rollback error, for proper retry classification.
sqldb/interfaces_test.go
- Added TestExecuteSQLTransactionPreservesBodyError and TestExecuteSQLTransactionPreservesCommitError to confirm that ExecuteSQLTransactionWithRetry correctly preserves the original transaction errors when rollbacks fail.
sqldb/sqlerrors.go
- Corrected a comment for parseSqliteError.
- Expanded parseSqliteError to classify SQLITE_IOERR, SQLITE_FULL, and SQLITE_LOCKED as ErrSerializationError, making them retryable.
sqldb/sqlerrors_test.go
- Added TestSqliteSerializationErrorCodes to verify the classification of new SQLite error codes as serialization errors.
- Included TestSqliteUniqueConstraintCodes and TestMapSQLErrorStringFallback for related error handling tests.

Activity

The author has implemented three distinct fixes to address the SQLite transaction retry failures.
Comprehensive unit tests have been added for each fix, covering kvdb/sqlbase, sqldb/interfaces, and sqldb/sqlerrors to ensure correct error handling and retry logic.
The changes aim to resolve a specific issue where neutrino-backed lnd nodes on Android devices would get stuck during block header sync due to transient I/O errors, typically around block 123,000.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request provides a comprehensive fix for a critical SQLite transaction retry issue on mobile platforms. The changes are well-structured into three parts: handling auto-rollback benign errors in kvdb/sqlbase, preserving the original transaction error in sqldb's retry logic, and expanding the set of retryable SQLite errors. The code is clear, and the accompanying unit tests in kvdb/sqlbase and sqldb/interfaces effectively validate the new behavior. I have one suggestion to improve the new tests in sqldb/sqlerrors_test.go to make them more robust.

gemini-code-assist · 2026-02-09T21:00:34Z

sqldb/sqlerrors_test.go

+// TestSqliteSerializationErrorCodes verifies that the correct SQLite error
+// codes are classified as serialization errors that should trigger transaction
+// retries.
+func TestSqliteSerializationErrorCodes(t *testing.T) {
+	t.Parallel()
+
+	// These error codes should all be classified as serialization errors
+	// by parseSqliteError.
+	serializableCodes := []struct {
+		name string
+		code int
+	}{
+		{
+			name: "SQLITE_BUSY",
+			code: sqlite3.SQLITE_BUSY,
+		},
+		{
+			name: "SQLITE_IOERR",
+			code: sqlite3.SQLITE_IOERR,
+		},
+		{
+			name: "SQLITE_FULL",
+			code: sqlite3.SQLITE_FULL,
+		},
+		{
+			name: "SQLITE_LOCKED",
+			code: sqlite3.SQLITE_LOCKED,
+		},
+	}
+
+	for _, tc := range serializableCodes {
+		t.Run(tc.name, func(t *testing.T) {
+			t.Parallel()
+
+			// Verify the error code values are what we expect
+			// from the SQLite library constants.
+			switch tc.code {
+			case sqlite3.SQLITE_BUSY:
+				require.Equal(t, 5, tc.code)
+
+			case sqlite3.SQLITE_IOERR:
+				require.Equal(t, 10, tc.code)
+
+			case sqlite3.SQLITE_FULL:
+				require.Equal(t, 13, tc.code)
+
+			case sqlite3.SQLITE_LOCKED:
+				require.Equal(t, 6, tc.code)
+			}
+		})
+	}
+}
+
+// TestSqliteUniqueConstraintCodes verifies that constraint violation codes are
+// correctly identified.
+func TestSqliteUniqueConstraintCodes(t *testing.T) {
+	t.Parallel()
+
+	// SQLITE_CONSTRAINT_UNIQUE and SQLITE_CONSTRAINT_PRIMARYKEY should
+	// both be classified as unique constraint violations.
+	require.Equal(t, 2067, sqlite3.SQLITE_CONSTRAINT_UNIQUE)
+	require.Equal(t, 1555, sqlite3.SQLITE_CONSTRAINT_PRIMARYKEY)
+}


The new tests, TestSqliteSerializationErrorCodes and TestSqliteUniqueConstraintCodes, are a bit misleading. Their names and comments suggest they test error classification, but their implementations only verify the integer values of constants from the sqlite3 library (e.g., require.Equal(t, 5, tc.code)). This doesn't actually test that parseSqliteError correctly classifies these error codes.

A more robust approach would be to test the classification logic itself. For example, verifying that parseSqliteError wraps an error with code SQLITE_IOERR into an ErrSerializationError.

Since creating *sqlite.Error instances for testing can be tricky, here are a couple of suggestions:

Rename these tests to something like TestSqliteConstantValues and update their comments to accurately reflect that they are just verifying library constants. This would at least remove the confusion.

A better, more thorough approach would be to find a way to test the classification logic, perhaps by creating an in-memory SQLite DB to trigger these errors, or by adding a test-only helper to construct *sqlite.Error instances.

This would make the tests for this critical error handling logic much stronger.

Roasbeef · 2026-02-09T22:04:52Z

I think there's another angle here re neutrino's error handling. Unclear why it didn't just continue from the sync, recovering from that failed write.

When SQLite encounters a critical error (SQLITE_IOERR, SQLITE_FULL, SQLITE_NOMEM) during a transaction, it automatically rolls back the transaction. If Go then attempts an explicit ROLLBACK, SQLite returns "cannot rollback - no transaction is active" (error code 1). Previously, attemptRollback did not recognize this error, causing it to propagate up as a fatal rollback failure. This masked the original transaction body error and prevented the retry logic from functioning. This commit adds "no transaction is active" as a recognized non-fatal rollback condition (alongside ErrTxClosed and "conn closed"), allowing the original error to flow through to the retry classification logic. Fixes lightningnetwork#10558

Previously, ExecuteSQLTransactionWithRetry would return the rollback error when rollbackTx failed, completely discarding the original body or commit error. This made debugging nearly impossible since the actual root cause was lost. Now when a rollback fails, we log the rollback error at WARN level (including the original error for context) but return the original body/commit error to the caller. This ensures the retry classification logic sees the real error and can make correct retry decisions.

Previously, parseSqliteError only classified SQLITE_BUSY as a serialization error eligible for transaction retries. On mobile/embedded platforms (particularly Android), transient I/O errors (SQLITE_IOERR), disk-full conditions (SQLITE_FULL), and table-level locks (SQLITE_LOCKED) are common and recoverable. This commit adds these three error codes to the serialization error classification, enabling the retry logic to handle them gracefully instead of immediately propagating them as fatal errors. Also fixes the parseSqliteError doc comment which incorrectly said "parsePostgresError".

ziggie1984

Looks good, not sure if classifying this new errors has any side effects on systems other than mobile ?

ziggie1984 · 2026-02-10T07:31:31Z

kvdb/sqlbase/db.go

+// rollback is attempted but no transaction is active. This can occur when
+// SQLite has already auto-rolled-back the transaction due to a critical error
+// such as SQLITE_IOERR, SQLITE_FULL, or SQLITE_NOMEM.
+const ErrNoTxActive = "no transaction is active"


we need to bump the kvdb version after this is merged

ziggie1984 · 2026-02-10T07:46:34Z

sqldb/sqlerrors.go

 			DBError: sqliteErr,
 		}

+	// Transient I/O errors can occur on mobile/embedded platforms due to


@Gemini-Assist is it a bit dangerous to now classify this as transient because on normal non mobile systems this could have side effects ?

SQLite's Error.Code() returns extended error codes that encode the specific failure subtype in the upper bits and the base error category in the lower 8 bits. For example, SQLITE_IOERR_GETTEMPPATH has extended code 6410 but base code SQLITE_IOERR (10). The previous implementation compared Code() directly against base constants like SQLITE_IOERR (10), which would never match extended codes like 6410. This meant the actual error from the Android bug report (disk I/O error with code 6410) fell through to the default "unknown sqlite error" case and was NOT classified as retryable, defeating the purpose of the fix. The same issue affected SQLITE_BUSY (which has extended variants SQLITE_BUSY_RECOVERY=261, SQLITE_BUSY_SNAPSHOT=517, SQLITE_BUSY_TIMEOUT=773) and SQLITE_LOCKED (SQLITE_LOCKED_SHAREDCACHE =262, SQLITE_LOCKED_VTAB=518). Fix by extracting the base code via `code & 0xFF` before matching against transient error categories. Constraint violations continue to use exact extended code matching since they are defined as specific extended codes (SQLITE_CONSTRAINT_UNIQUE=2067, SQLITE_CONSTRAINT_PRIMARYKEY=1555).

ziggie1984 · 2026-02-18T11:37:17Z

not sure if we need that one since the problem was the sandbox mode of android.

lightninglabs-deploy · 2026-02-25T12:37:40Z

@Roasbeef, remember to re-request review from reviewers when ready

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

ziggie1984 self-requested a review February 9, 2026 22:53

saubyk assigned Roasbeef Feb 9, 2026

Roasbeef added 2 commits February 9, 2026 17:07

Roasbeef force-pushed the fix/sqlite-rollback-no-tx-active branch from 83d76fc to b7b9869 Compare February 10, 2026 01:08

Roasbeef force-pushed the fix/sqlite-rollback-no-tx-active branch from b7b9869 to 4080eeb Compare February 10, 2026 01:09

ziggie1984 reviewed Feb 10, 2026

View reviewed changes

Roasbeef mentioned this pull request Feb 10, 2026

[bug]: Android: SQLite: Neutrino some wallets having sync stuck at block 123000 #10558

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqldb+kvdb: fix SQLite transaction retry failures on mobile platforms#10561

sqldb+kvdb: fix SQLite transaction retry failures on mobile platforms#10561
Roasbeef wants to merge 4 commits intolightningnetwork:masterfrom
Roasbeef:fix/sqlite-rollback-no-tx-active

Roasbeef commented Feb 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Uh oh!

Roasbeef commented Feb 9, 2026

Uh oh!

ziggie1984 left a comment

Uh oh!

ziggie1984 Feb 10, 2026

Uh oh!

ziggie1984 Feb 10, 2026

Uh oh!

ziggie1984 commented Feb 18, 2026

Uh oh!

lightninglabs-deploy commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Roasbeef commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix 1: Handle SQLite Auto-Rollback

Fix 2: Preserve Original Error on Rollback Failure

Fix 3: Classify Transient SQLite Errors as Retryable

Result

Testing

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Roasbeef commented Feb 9, 2026

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

ziggie1984 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

ziggie1984 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

ziggie1984 commented Feb 18, 2026

Uh oh!

lightninglabs-deploy commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Roasbeef commented Feb 9, 2026 •

edited

Loading