Skip to content

Fix IndexError in LogProbTokenNorm when choices_tokens is shorter than choices_logprob#1171

Open
worksbyfriday wants to merge 1 commit intohuggingface:mainfrom
worksbyfriday:fix/logprobtokennorm-length-mismatch
Open

Fix IndexError in LogProbTokenNorm when choices_tokens is shorter than choices_logprob#1171
worksbyfriday wants to merge 1 commit intohuggingface:mainfrom
worksbyfriday:fix/logprobtokennorm-length-mismatch

Conversation

@worksbyfriday
Copy link

Summary

Fixes issue #1170: LogProbTokenNorm crashes with IndexError when the number of token sequences is less than the number of log probabilities. This occurs during benchmarking when token generation fails for some answer choices.

Problem

When normalizing log probabilities for token-based metrics, the code assumes choices_tokens has at least as many elements as choices_logprob. However, when token generation fails for some choices, choices_tokens can be shorter, causing an IndexError on line 527:

normalized_log_probs = [
    choices_logprob[ix] / len(choices_tokens[ix]) for ix in range(len(choices_logprob))
]

From the issue (#1170):

  • 4 answer choices → 4 log probabilities
  • Token generation only succeeds for 3 choices → 3 token lists
  • IndexError when trying to access choices_tokens[3]

Solution

Replaced the list comprehension with an explicit loop that:

  1. Adds bounds checking: if ix < len(choices_tokens) and choices_tokens[ix]
  2. Filters out padding tokens marked as -1 when counting token length
  3. Provides a fallback: if tokens are missing, use the un-normalized log probability
  4. Prevents division by zero with max(token_count, 1)

Testing

Tested with the exact scenario from the issue:

choices_logprob = [-2.90625, -5.65625, -3.03125, -5.4375]  # 4 elements
choices_tokens = [[236743, 236812, -1], [236743, 236778, 236832], [236743, 236825, -1]]  # 3 elements

# Old code: IndexError at index 3
# New code: Returns 4 normalized values, using un-normalized logprob for missing tokens

Impact

  • Allows benchmarking to complete even when some choices fail token generation
  • Graceful degradation: missing tokens → use un-normalized logprob
  • No changes to the normalization logic for valid cases
  • Defensive programming that handles edge cases without breaking existing behavior

…n choices_logprob

Fixes issue huggingface#1170: LogProbTokenNorm crashes with IndexError when the number of
token sequences in choices_tokens is less than the number of log probabilities
in choices_logprob. This can occur during benchmarking when token generation
fails for some answer choices.

Changes:
- Replaced list comprehension with explicit loop to add bounds checking
- Check both length and presence of tokens: if ix < len(choices_tokens) and choices_tokens[ix]
- Filter out padding tokens marked as -1 when counting token length
- Add fallback: if tokens are missing, use log probability without normalization
- Avoid division by zero with max(token_count, 1)

This handles the edge case defensively without changing the core normalization
logic for valid cases. The fix allows the normalization to complete even when
some choices lack token data, using the un-normalized log probability as a
reasonable fallback.

Tested with the scenario from the issue:
- Input: 4 choices (4 logprobs, 3 token lists)
- Output: 4 normalized values (no IndexError)
- For the 4th choice with missing tokens, uses un-normalized logprob
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant