Skip to content

API: Improve StrictMetricsEvaluator handling of missing columns using maxFieldId#15252

Open
varun-lakhyani wants to merge 2 commits intoapache:mainfrom
varun-lakhyani:maxFieldId
Open

API: Improve StrictMetricsEvaluator handling of missing columns using maxFieldId#15252
varun-lakhyani wants to merge 2 commits intoapache:mainfrom
varun-lakhyani:maxFieldId

Conversation

@varun-lakhyani
Copy link
Contributor

@varun-lakhyani varun-lakhyani commented Feb 6, 2026

Description

Implements missing column detection in StrictMetricsEvaluator using the file's max field ID can be used mostly after schema evolution.

Resolves TODO comment in StrictMetricsEvaluator.java:72

Changes

  • Add maxFieldId to DataFile interface: tracks the max field id of schema used to write the file
  • Use maxFieldId in StrictMetricsEvaluator to detect missing columns:
    • If current field id > max field id, then schema evolution has occurred, so missing columns are null:
      • isNull and isNotNaN return ROWS_MUST_MATCH
      • All other operations return ROWS_MIGHT_NOT_MATCH

How Has This Been Tested?

  • Added unit tests covering all operations:
    • isNull and isNotNaN return ROWS_MUST_MATCH for missing columns
    • All other operations return ROWS_MIGHT_NOT_MATCH for missing columns

This is just kind of proof of concept, this new maxFieldId is passed from test only and otherwise it's null and go through usual behaviour.
If looks fine,implementations are pending for this field to infer while loading data and creating metadata file.

@github-actions github-actions bot added the API label Feb 6, 2026
@github-actions github-actions bot added the core label Feb 6, 2026
@varun-lakhyani
Copy link
Contributor Author

varun-lakhyani commented Feb 6, 2026

@pvary if you can please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant