Skip to content

Milestone 7: Hough voting + feature matching (pure-Rust KPM)#123

Merged
kalwalt merged 19 commits into
devfrom
feat/milestone7-hough-voting-feature-match
May 13, 2026
Merged

Milestone 7: Hough voting + feature matching (pure-Rust KPM)#123
kalwalt merged 19 commits into
devfrom
feat/milestone7-hough-voting-feature-match

Conversation

@kalwalt
Copy link
Copy Markdown
Member

@kalwalt kalwalt commented May 12, 2026

Summary

Rolls up Milestone 7 — Hough voting and feature matching in pure Rust — into dev. Closes the parent issue #108 once merged.

This PR aggregates four sub-PRs (all individually reviewed and merged into the milestone branch) plus their cross-platform CI hardening:

Sub-issue Sub-PR Status
#109 — Hough similarity voting #112 ✅ merged
#110 — K-Medoids + BHC #114 ✅ merged
#111 — FeatureStore + FeatureMatcher (3 variants) #115 ✅ merged
#116 — ArrayShuffle for C++ parity #117 ✅ merged

What's delivered

Pure-Rust KPM matching pipeline

  • Hough similarity voting (hough.rs) — 4D discretized voting scheme to find the consistent similarity transformation across matched feature pairs
  • K-Medoids clustering + Binary Hierarchical Clustering (clustering.rs) — vocabulary tree for O(log n) approximate-NN search on 96-byte FREAK descriptors
  • FeatureStore + FeatureMatcher (matcher.rs) — container + 3 match variants (brute, BHC-indexed, homography-guided) with C++-faithful ratio test
  • ArrayShuffle PRNG (clustering.rs) — byte-identical port of vision::FastRandom and vision::ArrayShuffle for BHC C++ parity

Type unification (M7-3)

  • Added maxima: bool to FeaturePoint (matches C++ vision::FeaturePoint)
  • Reshaped Match to { ins, ref_ } (matches C++ match_t)
  • Renamed M7-1's hough-specific match struct to HoughMatch to free up the canonical name

Cross-platform CI hardening (PR #117)

  • New CI step cargo test -p webarkitlib-rs --lib --features dual-mode runs the FFI parity tests on ubuntu, macOS, windows
  • build.rs fixed to link platform-appropriate C++ stdlib (libc++ on macOS, libstdc++ on Linux)
  • <limits> include order fixed in kpm_c_api.cpp for GCC compatibility
  • M6-2 dual-mode test tolerances widened with combined absolute + relative form to accommodate ARM64 FMA variance

Validation

Dual-mode test coverage (now CI-enforced on 3 OS)

Subsystem Dual-mode strength
match_all (brute force) Count + sorted pair equality + global Hamming optimality
match_indexed (BHC) Count + sorted pair equality (after #116 unlocked parity)
match_guided (homography) Count + sorted pair equality
fast_random / array_shuffle Byte-identical sequence + state evolution vs C++
Existing M6 math / homography Combined abs+rel tolerance, ARM64-safe

Test counts

  • Pure-Rust: 274 lib tests passing
  • With dual-mode: 292 lib tests passing
  • CI: 16/16 green on the milestone tip

Known follow-ups (out of scope, tracked separately)

Closes

Base / merge strategy

Target: dev. Branch is 19 commits ahead of dev and 11 behind (unrelated work in dev: matrix codes, param_gl, load_nft arlog cleanup). No expected conflicts — M7 touches crates/core/src/kpm/freak/ exclusively.

Use a merge commit (not squash) — the project uses Conventional Commits per-PR for git-cliff changelog generation, and the sub-PR commit messages are the source of truth.

kalwalt and others added 19 commits May 7, 2026 10:16
Port Hough voting for similarity transformation matching.
Resolves #109.

- Port HoughSimilarityVoting struct with 4D bin discretization
- Implement find_hough_similarity() and find_hough_matches() functions
- BinParams encapsulates bin calculation logic and validation
- Vote accumulation via HashMap with 16-way bilinear interpolation
- Comprehensive unit tests for bin math, voting, and match filtering
- Stub types for M8 (DoGScaleInvariantDetector, GaussianScaleSpacePyramid)
- Functionally equivalent to C++ implementation; all 218 tests pass

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Use correct import: crate::{arlog_d, arlog_e, arlog_i} instead of crate::arlog
- Rename unused variables with leading underscore (_max_bx, etc.) for clarity
- Update TODO comment for find_hough_matches stub implementation
- All 229 tests pass, clean build with no warnings

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Fix arlog macro imports: use crate::{arlog_d, arlog_e, arlog_i}
- Remove unnecessary variable mutability (bx, by, bs)
- Add #[allow(clippy::too_many_arguments)] for C++ ported functions
- Prefix unused variables with underscore (_bin_delta, _max_*)
- Update copyright year to 2026
- Fix empty line after doc comment

Fixes #109

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
… FREAK descriptors

- Add K-Medoids clustering algorithm with deterministic k-medoids++ initialization
- Implement Binary Hierarchical Clustering (BHC) vocabulary tree for fast nearest-neighbor search
- Add Hamming distance calculation optimized with bit-magic for 96-byte descriptors
- Comprehensive test coverage for all components (Hamming distance, K-Medoids, BHC)
- Use arlog macros for logging errors and debug information
- Follows idiomatic Rust patterns and project conventions

Fixes #110
- Remove unused arlog_i import
- Prefix unused max_nodes_to_pop field with underscore
- Use or_insert_with(Vec::default) instead of or_insert_with(Vec::new)

Fixes #110
Fixes CI clippy warnings on PR #114
- Add #[allow(clippy::vec_box)] to children field (Box necessary for recursive type)
- Change or_insert_with(Vec::default) to or_default()

Fixes #110
…ants

Implements the C++ BinaryFeatureStore and BinaryFeatureMatcher classes
in pure Rust. The matcher uses the BHC index from M7-2 and provides
three match variants matching the C++ overloads:

- match_all: brute force comparison (used as fallback in KPM)
- match_indexed: BHC-backed candidate lookup (primary fast path)
- match_guided: homography-guided spatial filter (second-pass refinement)

All three apply the C++ ratio test (1st best / 2nd best < 0.7 threshold)
and filter by FeaturePoint::maxima for bit-parity with C++. Inner loops
are marked #[inline(never)] for profiler clarity per the issue spec.

Type migration in hough.rs (M7-1):
- Added maxima: bool field to FeaturePoint (matches C++ vision::FeaturePoint)
- Reshaped Match to { ins, ref_ } (matches C++ vision::match_t)
- Renamed M7-1's hough-specific Match to HoughMatch (carries distance for
  vote ranking)

Tests: 19 unit tests covering FeatureStore, FeatureMatcher constructor/
config, all three match variants, ratio test rejection, maxima filtering,
spatial constraint, and helper functions.

Dual-mode FFI tests deferred to a follow-up PR (the C++ shim functions
need to be added to kpm_c_api.cpp). MutualCorrespondenceBinaryFeatureMatcher
also intentionally skipped — it's dead code in the KPM pipeline (verified
by grep against visual_database-inline.h).

Fixes #111
Adds C++ shim functions and Rust dual-mode tests that validate the
pure-Rust matcher against the C++ baseline at runtime, following the
project's existing dual-mode pattern (homography.rs, math.rs).

C++ shims (kpm_c_api.cpp/.h):
- webarkit_cpp_match_features_brute
- webarkit_cpp_match_features_indexed
- webarkit_cpp_match_features_guided

Each shim builds a vision::BinaryFeatureStore from flat C arrays and
runs the corresponding BinaryFeatureMatcher<96>::match() overload, then
copies the resulting matches back through caller-provided output arrays.

Rust dual-mode tests (matcher.rs, gated on `dual-mode` feature):
- dual_mode_match_all_within_tolerance: brute force, expects |rust - cpp|
  <= 2 (small RNG/FP order variance)
- dual_mode_match_indexed_within_tolerance: BHC-indexed, accepts wider
  divergence (BHC RNG differs between C++ and Rust per PR #114)
- dual_mode_match_guided_within_tolerance: spatial-constrained with
  identity homography + large radius, expects |rust - cpp| <= 2

All tests use 50 query × 100 reference random descriptors with shared
seeds between query[i] and ref[i] for the first num_ref entries to
ensure non-trivial match counts.

Tests run with: cargo test --features dual-mode -- kpm::freak::matcher

Refs #111
…ty and global-best invariants

Following review feedback: count-only comparison gives a false sense of
correctness. Two implementations producing the same number of matches
with completely different (ins, ref) pairs would still pass the
previous tests.

Adds two stronger correctness checks:

A) Sorted-pair equality (Rust vs C++):
   - For brute-force and guided variants (both deterministic), assert
     that sorted match-pair lists are byte-for-byte identical between
     Rust and C++ when match counts agree.
   - Indexed variant retains count-only check because BHC RNG diverges
     between Rust and C++ (documented in PR #114, follow-up porting
     ArrayShuffle is the known fix).

B) Independent invariants per match (no C++ comparison needed):
   - assert_match_invariants: bounds + maxima agreement on every match
     (catches indexing bugs and missing maxima filter).
   - assert_brute_force_optimality: for each brute-force match, verify
     the matched reference has the GLOBAL minimum Hamming distance among
     same-maxima refs AND the ratio test (best/second_best < threshold)
     was actually applied. This is an absolute correctness check that
     would catch bugs even if both Rust and C++ shared them.

These invariants would catch real bugs that count-only comparison would
miss: wrong index, missing maxima filter, ratio-test off-by-one,
incorrect distance computation, etc.

All 22 matcher tests pass (19 unit + 3 dual-mode with strengthened
invariants).

Refs #111
The matcher FFI shims include feature_matcher-inline.h, which transitively
includes hamming.h. hamming.h uses std::numeric_limits but doesn't include
<limits> itself. MSVC pulls it in transitively, but GCC on Linux does not,
causing the CI build to fail with:

    error: 'numeric_limits' is not a member of 'std'

Adding <limits> to kpm_c_api.cpp ensures the symbol is available when the
matcher headers are processed.

Refs #111
Previous fix added <limits> but placed it AFTER the matchers/ headers.
Due to include guards, by the time <limits> was processed, hamming.h
had already been parsed and failed to find std::numeric_limits.

Moving all standard-library includes to the top of the file ensures
<limits> is available when hamming.h is processed via the transitive
inclusion chain (matchers/feature_matcher-inline.h -> math/hamming.h).

Refs #111
Replaces the LCG-based shuffle in BinaryHierarchicalClustering with an
exact port of vision::FastRandom and vision::ArrayShuffle from
math/rand.h. With this change, the Rust BHC index produces a tree
topology byte-identical to the C++ baseline given the same seed.

This unlocks pair-equality dual-mode validation for match_indexed,
which previously could only assert non-negative match counts because
of RNG divergence (documented as a follow-up in PR #114 / PR #115).

Changes:

clustering.rs:
- Add fast_random(seed) — 15-bit LCG matching vision::FastRandom
  exactly. Uses i32 with wrapping arithmetic to match C++ int.
- Add array_shuffle(v, sample_size, seed) — matches vision::ArrayShuffle.
  Note: NOT Fisher-Yates; k is drawn from [0, pop_size) per C++.
- Replace SimpleRng (LCG) usage in KMedoids::assign with array_shuffle.
- Initialize rand_indices ONCE before the hypothesis loop (matches C++
  kmedoids.h:163 SequentialVector + per-hypothesis ArrayShuffle).
- Change KMedoids::rand_seed type from u64 to i32 to match C++ int and
  enable byte-identical PRNG sequence.
- Add KMedoids::rand_seed() getter for state-evolution tests.
- Fix BHC query: query_recursive now visits ALL children with the
  minimum Hamming distance, matching C++ Node::nearest behavior in
  binary_hierarchical_clustering.h ("Any nodes that are the SAME
  distance as the minimum node are added to the output vector").

kpm_c_api.cpp/.h:
- Add webarkit_cpp_fast_random — wraps vision::FastRandom
- Add webarkit_cpp_array_shuffle — wraps vision::ArrayShuffle
- Include math/rand.h

matcher.rs:
- Strengthen dual_mode_match_indexed_within_tolerance: now asserts
  count agreement (±2) AND sorted pair equality with C++, matching
  the brute-force and guided variants.

Tests added (clustering.rs):
- 8 unit tests for fast_random and array_shuffle (frozen expected
  values + property-based checks)
- 3 dual-mode tests (gated on `dual-mode` feature) verifying
  byte-identical PRNG output and persistent seed evolution vs C++

All tests pass:
- cargo test -p webarkitlib-rs --lib                  -> 274 passed
- cargo test --features dual-mode -p webarkitlib-rs   -> 292 passed
- cargo clippy --workspace -- -D warnings             -> clean
- cargo fmt --all -- --check                          -> clean

Note: dual-mode tests are local-only (CI does not run --features
dual-mode currently). Adding a dual-mode CI job is a follow-up.

Fixes #116
Adds a test step to the existing kpm-build job that runs:
  cargo test -p webarkitlib-rs --features dual-mode

This step runs after the FFI backend build (the dependency C++ shims
are already compiled by that point) and exercises the pure-Rust ports
against the C++ baseline via the webarkit_cpp_* FFI bridges:

  - homography exp / robust homography (M6-3, #65)
  - math utils: fast_atan2, fast_sqrt1, fast_exp6 (M6-1, #63)
  - linear solvers (M6-2, #64)
  - feature matcher: brute, indexed, guided (M7-3, #111)
  - PRNG: fast_random, array_shuffle (M7, #116)

Without this CI step, dual-mode regressions could slip in unnoticed
because the pure-Rust test job (build-and-test) does not enable the
dual-mode feature.

The kpm-build job already runs on all three platforms (ubuntu, macOS,
windows) so we get cross-platform parity coverage at no extra setup.

Refs #116
…test failures

The previous step ran all tests in the package including integration
tests under tests/, which have pre-existing cross-platform failures
(test_full_pipeline_pose on Windows, linker errors on macOS) that
are unrelated to dual-mode validation.

The dual-mode tests live inside the library (crates/core/src/...)
under #[cfg(feature = "dual-mode")] modules, so --lib is the
correct scope for this CI step.

Refs #116
…stdc++ on Linux)

The build script unconditionally linked stdc++ on all non-MSVC targets,
which fails on macOS where the system C++ stdlib is libc++. The error
was hidden until now because the existing kpm-build CI step only ran
'cargo build' (compiles rlib without linking a binary). The new
dual-mode test step links a test binary and exposed the bug:

    ld: library 'stdc++' not found

Fix: select the C++ stdlib based on target triple:
- *-apple-* targets link libc++
- Other non-MSVC targets link libstdc++ (Linux convention)
- MSVC handles its own C++ runtime

Refs #116
…tests

The fixed 1e-5 absolute tolerance in solve_linear_system_2x2_matches_cpp
and solve_symmetric_linear_system_3x3_matches_cpp is too tight for f32
values with magnitude > ~10. Apple Silicon ARM64 uses FMA differently
from x86_64, producing tiny precision differences that exceed 1e-5 for
larger results.

Example failure on macOS ARM64:
    rust=-40.826614, cpp=-40.8266, diff=1.5e-5

The diff of 1.5e-5 represents a relative error of ~3.7e-7, which is
well within f32 precision but exceeds the absolute tolerance.

Fix: use combined tolerance:
    tol = max(1e-5, |value| * 1e-6)

This gives:
- |x| <= 10: tolerance unchanged at 1e-5 (preserves existing strictness)
- |x|  = 100: tolerance 1e-4 (~1e-6 relative)
- |x|  = 1000: tolerance 1e-3 (~1e-6 relative)

These bounds are consistent with f32's ~7-digit precision and
accommodate platform-specific FMA rounding without losing test
sensitivity for typical values.

Surfaced now because PR #117's new dual-mode CI step exposed this
pre-existing issue on macOS runners. The math itself is correct;
only the test tolerance was too strict.

Refs #116
The previous 1e-6 relative scaling was still too tight for near-singular
2x2 systems on Apple Silicon ARM64. A failing case showed:

    rust=-358.62683, cpp=-358.62802, diff=1.19e-3, tol=3.59e-4

The relative error of ~3.3e-6 exceeded the 1e-6 scale factor.

Random 2x2 systems occasionally hit ill-conditioned configurations where
solutions have large magnitude and are very sensitive to FMA rounding
order. f32 has ~7 decimal digits of precision, so even well-conditioned
results can diverge by ~1e-5 relative across platforms; ill-conditioned
ones diverge more.

Bump relative scale from 1e-5 (per |x|) to ensure cross-platform
stability without losing test sensitivity for typical values:

    tol = max(1e-5, |x| * 1e-5)

This is consistent with f32 precision floor and accommodates the worst
observed cross-platform divergence with a safety margin.

Refs #116
@kalwalt kalwalt moved this from Backlog to In progress in Plan to port KPM to rust May 12, 2026
kalwalt added a commit that referenced this pull request May 12, 2026
The arlog tests install a global CaptureLogger via log::set_logger and
serialize themselves with test_lock(), but the underlying log dispatcher
is shared across the entire process. Tests running in parallel on other
threads emit log records that get captured into the same buffer,
breaking exact-count assertions like:

    assert_eq!(records.len(), 1);

This was previously latent — the M7 milestone branch's CI was green —
but the M7 -> dev rollup PR (#123) merges dev's new tests with M7's new
arlog_i!/arlog_d! emitters, increasing the chance of parallel emission
while an arlog test holds the lock and drains its buffer. Concretely,
arlog_i_emits_info_record began failing with left: 2, right: 1.

Fix: filter at the logger level by the current capturing thread.
init_capture() now records the current ThreadId; CaptureLogger::log()
drops any record not emitted on that thread. Since test_lock() ensures
only one arlog test runs at a time, the active ThreadId is unambiguous
for the duration of the lock.

The five affected count-based asserts (arlog_i_emits_info_record,
arlog_rel_uses_distinct_target, arlog_perror_no_prefix,
arlog_perror_with_prefix, unfiltered_debug_arg_is_evaluated) are
unchanged — the fix is at the capture layer, so all of them benefit.

Unblocks #123 (M7 -> dev rollup).
@kalwalt kalwalt merged commit fd0580b into dev May 13, 2026
23 of 24 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in Plan to port KPM to rust May 13, 2026
@kalwalt kalwalt deleted the feat/milestone7-hough-voting-feature-match branch May 18, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant