Fix column mismatch and metric in SimpleQA

## Describe the bug
The `simpleqa` evaluation failed due to two main reasons:
Column name mismatch in the prompt function: the code used `question` but the dataset uses `problem`.

## To Reproduce
```python
task = "simpleqa|5"

pipeline = Pipeline(
    tasks=task,
    pipeline_parameters=pipeline_params,
    evaluation_tracker=evaluation_tracker,
    model_config=model_config,
)

pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results()
```
```python
    140 # We init tasks first to fail fast if one is badly defined
    141 self._init_random_seeds()
--> 142 self._init_tasks_and_requests(tasks=tasks)
    144 self.model_config = model_config
    145 self.accelerator, self.parallel_context = self._init_parallelism_manager()
...
     37         [f"\n{key}. {choice}" for key, choice in zip(["A", "B", "C", "D", "E", "F"], line["choices"]["text"])]
     38     )
     39     query += "\nAnswer:"

KeyError: 'question'
```

## Expected behavior
- The prompt function should use the correct `problem` key.

## Version info
- OS: mac
- Lighteval version: main (local development)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix column mismatch and metric in SimpleQA #1138

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fix column mismatch and metric in SimpleQA #1138

Description

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions