Skip to content

fix(consensus/XDPoS): fix unknown ancestor error and VerifyHeaders result race, XFN-12#2139

Open
gzliudan wants to merge 1 commit intoXinFinOrg:dev-upgradefrom
gzliudan:fix-verify-order
Open

fix(consensus/XDPoS): fix unknown ancestor error and VerifyHeaders result race, XFN-12#2139
gzliudan wants to merge 1 commit intoXinFinOrg:dev-upgradefrom
gzliudan:fix-verify-order

Conversation

@gzliudan
Copy link
Collaborator

@gzliudan gzliudan commented Mar 6, 2026

Proposed changes

fix audit issue XFN-12: Broken Result Ordering in VerifyHeaders

Root cause:

  • In mixed batches, v2 ancestor lookup could miss parents that existed only in the input batch.
  • The adaptor previously let EngineV1 and EngineV2 write to a shared results channel concurrently, creating ordering/mapping ambiguity.

Changes:

  • Add verifyChainReader to overlay in-batch headers/blocks for GetHeader/GetHeaderByNumber/GetHeaderByHash/GetBlock, with fallback to canonical chain data.
  • Use verifyChainReader in mixed v1/v2 VerifyHeaders paths.
  • Split mixed verification into per-engine result channels and fan-in deterministically (all v1 first, then all v2).
  • Make newVerifyChainReader always return a non-nil, nil-safe reader.

Tests:

  • Add verify_chain_reader unit tests for:
  • number lookup shadowing,
  • parent resolution from in-batch headers,
  • nil-safe constructor behavior,
  • mixed nil-chain no-panic,
  • deterministic mixed result order (v1 then v2).

error message:

WARN [03-14|23:15:18.025] [VerifyHeaders] Fail to verify header    fullVerify=true blockNum=57,372,740 blockHash=ed239f..aaa7a7 error="parentHeader is nil"
INFO [03-14|23:15:19.117] Imported new chain segment               blocks=25  txs=158   mgas=180.259  elapsed=9.049s      mgasps=19.919   number=57,372,608 hash=51904b..8f9795 age=2y3mo4w   dirty=0.00B
INFO [03-14|23:15:27.469] Imported new chain segment               blocks=38  txs=164   mgas=201.322  elapsed=8.352s      mgasps=24.104   number=57,372,646 hash=bbae68..589809 age=2y3mo4w   dirty=0.00B
INFO [03-14|23:15:35.680] Imported new chain segment               blocks=32  txs=132   mgas=189.217  elapsed=8.210s      mgasps=23.046   number=57,372,678 hash=3e890a..adb56b age=2y3mo4w   dirty=0.00B
INFO [03-14|23:15:43.849] Imported new chain segment               blocks=37  txs=154   mgas=202.454  elapsed=8.169s      mgasps=24.782   number=57,372,715 hash=2a0849..ece177 age=2y3mo4w   dirty=0.00B
ERROR[03-14|23:15:47.688] [FindParentBlockToAssign] Can not find parent block from highestQC proposedBlockInfo x.highestQuorumCert.ProposedBlockInfo.Hash=7c02dc..d6327c x.highestQuorumCert.ProposedBlockInfo.Number=57,372,749
ERROR[03-14|23:15:47.688] [processQC] Block not found using the QC quorumCert.ProposedBlockInfo.Hash=7c02dc..d6327c incomingQuorumCert.ProposedBlockInfo.Number=57,372,749
ERROR[03-14|23:15:47.688] [ProposedBlockHandler] Fail to processQC "QC proposed blockInfo round number"=545409 "QC proposed blockInfo hash"=7c02dc..d6327c
INFO [03-14|23:15:47.688] [downloader] handle proposed block has error err="block not found, number: 57372749, hash: 0x7c02dca86a22ec4ec035181e48f6caa3d2c19b427f2944b378d3bea1a8d6327c" "block hash"=04dc92..b743c0 number=57,372,750
WARN [03-14|23:15:47.695] [VerifyHeaders] Fail to verify header    fullVerify=true blockNum=57,372,751 blockHash=abf618..2e866d error="unknown ancestor"
ERROR[03-14|23:15:47.706] 
########## BAD BLOCK #########
Number: 57372751
Hash: 0xabf6185b4fa86289dfa942beda325103f4d61cd1853b7bc561c93486e22e866d
Round: 545411
Error: unknown ancestor
Chain configuration:
  - ChainID:                     51      
  - Homestead:                   1       
  - DAO Fork:                    <nil>
  - DAO Support:                 false   
  - Tangerine Whistle (EIP 150): 2       
  - Spurious Dragon (EIP 155):   3       
  - Byzantium:                   4       
  - Constantinople:              <nil>
  - Petersburg:                  <nil>
  - Istanbul:                    <nil>
  - TIP2019Block:                1       
  - TIPSigning:                  3000000 
  - TIPRandomize:                3464000 
  - TIPIncreaseMasternodes:      5000000 
  - DenylistHFNumber:            23779191
  - TIPNoHalvingMNReward:        23779191
  - TIPXDCX:                     23779191
  - TIPXDCXLending:              23779191
  - TIPXDCXCancellationFee:      23779191
  - TIPTRC21Fee:                 23779191
  - Berlin:                      61290000
  - London:                      61290000
  - Merge:                       61290000
  - Shanghai:                    61290000
  - BlockNumberGas50x:           56828700
  - TIPXDCXMinerDisable:         61290000
  - TIPXDCXReceiverDisable:      66825000
  - Eip1559:                     71550000
  - Cancun:                      71551800
  - Prague:                      9223372036854775807
  - Osaka:                       9223372036854775807
  - DynamicGasLimitBlock:        9223372036854775807
  - TIPUpgradeReward:            9223372036854775807
  - TipUpgradePenalty:           9223372036854775807
  - TIPEpochHalving:             9223372036854775807
  - Engine:                      XDPoS
    - Period: 2
    - Epoch: 900
    - Reward: 5000
    - RewardCheckpoint: 900
    - Gap: 450
    - FoundationWalletAddr: xdc746249C61f5832C5eEd53172776b460491bDcd5C
    - SkipV1Validation: false
    - V2:
      - SwitchEpoch: 63143
      - SwitchBlock: 56828700
      - CurrentConfig:
        - MaxMasternodes: 15
        - SwitchRound: 0
        - MinePeriod: 2
        - TimeoutSyncThreshold: 3
        - TimeoutPeriod: 60
        - CertThreshold: 0.45
        - MasternodeReward: 0
        - ProtectorReward: 0
        - ObserverReward: 0
        - MinimumMinerBlockPerEpoch: 0
        - LimitPenaltyEpoch: 0
        - MinimumSigningTx: 0
        - ExpTimeoutBase: 1
        - ExpTimeoutMaxExponent: 0
Receipts: 
##############################

WARN [03-14|23:15:47.712] Synchronisation failed, dropping peer    peer=4464397694e0c8b1 err="retrieved hash chain is invalid: unknown ancestor"

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Changes that don't change source code or tests
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert: Revert something
  • style: Changes that do not affect the meaning of the code
  • test: Adding missing tests or correcting existing tests

Impacted Components

Which parts of the codebase does this PR touch?
Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

Copilot AI review requested due to automatic review settings March 6, 2026 11:08
@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc11d4f3-6754-4da4-8928-08b161f77b8b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a result-ordering bug in XDPoS.VerifyHeaders where splitting headers into v1/v2 buckets and running two concurrent goroutines caused nondeterministic result-to-header mapping around the v1→v2 switch boundary. This led to misleading "BAD BLOCK" log messages (e.g., a v1-style error attributed to a v2-height header) and sync failures (issue #2138).

Changes:

  • consensus/XDPoS/XDPoS.go: Replaced the two-bucket/two-goroutine VerifyHeaders implementation with a single goroutine that iterates headers in input order, dispatching each to the appropriate engine version.
  • consensus/tests/engine_v2_tests/adaptor_test.go: Added TestAdaptorVerifyHeadersKeepsInputOrderAcrossConsensusSwitch to assert that results arrive in the same order as the input slice across the consensus switch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
consensus/XDPoS/XDPoS.go Rewrites VerifyHeaders to a single sequential goroutine preserving input order
consensus/tests/engine_v2_tests/adaptor_test.go New regression test for result ordering across the v1/v2 switch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gzliudan gzliudan force-pushed the fix-verify-order branch 4 times, most recently from 16837f0 to 41a76ec Compare March 7, 2026 13:43
@gzliudan gzliudan changed the title fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 [WIP] fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 Mar 9, 2026
@gzliudan gzliudan changed the title [WIP] fix(consensus): preserve verifyheaders order across v1-v2 switch, #2138, XFN-12 [WIP] fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 Mar 9, 2026
@gzliudan gzliudan changed the title [WIP] fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 Mar 9, 2026
@gzliudan gzliudan force-pushed the fix-verify-order branch 2 times, most recently from a14f15c to eea3308 Compare March 10, 2026 07:01
@gzliudan gzliudan added the WIP work in process label Mar 11, 2026
@gzliudan gzliudan force-pushed the fix-verify-order branch 3 times, most recently from b543a79 to 776237d Compare March 13, 2026 00:16
@gzliudan gzliudan force-pushed the fix-verify-order branch 2 times, most recently from 5d50071 to aacc2b3 Compare March 13, 2026 05:09
@gzliudan gzliudan removed the WIP work in process label Mar 13, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): stabilize VerifyHeaders across v1-v2 switch, fix #2138 XFN-12 fix(core,consensus/XDPoS): split header verification by consensus version, close XFN-12 Mar 13, 2026
@gzliudan gzliudan requested a review from Copilot March 13, 2026 06:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gzliudan gzliudan changed the title fix(core,consensus/XDPoS): split header verification by consensus version, close XFN-12 fix(consensus/XDPoS): resolve mixed v1/v2 VerifyHeaders ancestor lookup, fix XFN-12 Mar 16, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): resolve mixed v1/v2 VerifyHeaders ancestor lookup, fix XFN-12 fix(consensus/XDPoS): fix mixed v1/v2 VerifyHeaders ancestor lookup and result ordering, fix XFN-12 Mar 16, 2026
@gzliudan gzliudan requested a review from Copilot March 16, 2026 09:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix mixed v1/v2 VerifyHeaders ancestor lookup and result ordering, fix XFN-12 fix(consensus/XDPoS): fix unknown ancestor error and mixed v1/v2 VerifyHeaders result ordering, XFN-12 Mar 17, 2026
@gzliudan gzliudan changed the title fix(consensus/XDPoS): fix unknown ancestor error and mixed v1/v2 VerifyHeaders result ordering, XFN-12 fix(consensus/XDPoS): fix unknown ancestor error and VerifyHeaders result race, XFN-12 Mar 17, 2026
…sult race, XFN-12

Background:
Mixed v1/v2 VerifyHeaders batches could fail with ErrUnknownAncestor when the first v2 header depended on an in-flight parent header not yet persisted to DB.

Root cause:
- In mixed batches, v2 ancestor lookup could miss parents that existed only in the input batch.
- The adaptor previously let EngineV1 and EngineV2 write to a shared results channel concurrently, creating ordering/mapping ambiguity.

Changes:
- Add verifyChainReader to overlay in-batch headers/blocks for GetHeader/GetHeaderByNumber/GetHeaderByHash/GetBlock, with fallback to canonical chain data.
- Use verifyChainReader in mixed v1/v2 VerifyHeaders paths.
- Split mixed verification into per-engine result channels and fan-in deterministically (all v1 first, then all v2).
- Make newVerifyChainReader always return a non-nil, nil-safe reader.

Tests:
- Add verify_chain_reader unit tests for:
  - number lookup shadowing,
  - parent resolution from in-batch headers,
  - nil-safe constructor behavior,
  - mixed nil-chain no-panic,
  - deterministic mixed result order (v1 then v2).
- Add engine_v2 regression test to verify mixed headers pass even when GetHeader(parentHash, number) is masked.

Impact:
Fixes XFN-12 and stabilizes mixed-batch verification behavior across v1->v2 boundary handling and result emission.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants