Fix IndexError in LogProbTokenNorm when choices_tokens is shorter than choices_logprob by worksbyfriday · Pull Request #1171 · huggingface/lighteval

worksbyfriday · 2026-02-18T16:33:08Z

Summary

Fixes issue #1170: LogProbTokenNorm crashes with IndexError when the number of token sequences is less than the number of log probabilities. This occurs during benchmarking when token generation fails for some answer choices.

Problem

When normalizing log probabilities for token-based metrics, the code assumes choices_tokens has at least as many elements as choices_logprob. However, when token generation fails for some choices, choices_tokens can be shorter, causing an IndexError on line 527:

normalized_log_probs = [
    choices_logprob[ix] / len(choices_tokens[ix]) for ix in range(len(choices_logprob))
]

From the issue (#1170):

4 answer choices → 4 log probabilities
Token generation only succeeds for 3 choices → 3 token lists
IndexError when trying to access choices_tokens[3]

Solution

Replaced the list comprehension with an explicit loop that:

Adds bounds checking: if ix < len(choices_tokens) and choices_tokens[ix]
Filters out padding tokens marked as -1 when counting token length
Provides a fallback: if tokens are missing, use the un-normalized log probability
Prevents division by zero with max(token_count, 1)

Testing

Tested with the exact scenario from the issue:

choices_logprob = [-2.90625, -5.65625, -3.03125, -5.4375]  # 4 elements
choices_tokens = [[236743, 236812, -1], [236743, 236778, 236832], [236743, 236825, -1]]  # 3 elements

# Old code: IndexError at index 3
# New code: Returns 4 normalized values, using un-normalized logprob for missing tokens

Impact

Allows benchmarking to complete even when some choices fail token generation
Graceful degradation: missing tokens → use un-normalized logprob
No changes to the normalization logic for valid cases
Defensive programming that handles edge cases without breaking existing behavior

…n choices_logprob Fixes issue huggingface#1170: LogProbTokenNorm crashes with IndexError when the number of token sequences in choices_tokens is less than the number of log probabilities in choices_logprob. This can occur during benchmarking when token generation fails for some answer choices. Changes: - Replaced list comprehension with explicit loop to add bounds checking - Check both length and presence of tokens: if ix < len(choices_tokens) and choices_tokens[ix] - Filter out padding tokens marked as -1 when counting token length - Add fallback: if tokens are missing, use log probability without normalization - Avoid division by zero with max(token_count, 1) This handles the edge case defensively without changing the core normalization logic for valid cases. The fix allows the normalization to complete even when some choices lack token data, using the un-normalized log probability as a reasonable fallback. Tested with the scenario from the issue: - Input: 4 choices (4 logprobs, 3 token lists) - Output: 4 normalized values (no IndexError) - For the 4th choice with missing tokens, uses un-normalized logprob

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IndexError in LogProbTokenNorm when choices_tokens is shorter than choices_logprob#1171

Fix IndexError in LogProbTokenNorm when choices_tokens is shorter than choices_logprob#1171
worksbyfriday wants to merge 1 commit intohuggingface:mainfrom
worksbyfriday:fix/logprobtokennorm-length-mismatch

worksbyfriday commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

worksbyfriday commented Feb 18, 2026

Summary

Problem

Solution

Testing

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant