Skip to content

Fix column mismatch and metric in SimpleQA #1138

@pjavanrood

Description

@pjavanrood

Describe the bug

The simpleqa evaluation failed due to two main reasons:
Column name mismatch in the prompt function: the code used question but the dataset uses problem.

To Reproduce

task = "simpleqa|5"

pipeline = Pipeline(
    tasks=task,
    pipeline_parameters=pipeline_params,
    evaluation_tracker=evaluation_tracker,
    model_config=model_config,
)

pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results()
    140 # We init tasks first to fail fast if one is badly defined
    141 self._init_random_seeds()
--> 142 self._init_tasks_and_requests(tasks=tasks)
    144 self.model_config = model_config
    145 self.accelerator, self.parallel_context = self._init_parallelism_manager()
...
     37         [f"\n{key}. {choice}" for key, choice in zip(["A", "B", "C", "D", "E", "F"], line["choices"]["text"])]
     38     )
     39     query += "\nAnswer:"

KeyError: 'question'

Expected behavior

  • The prompt function should use the correct problem key.

Version info

  • OS: mac
  • Lighteval version: main (local development)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions