BioThink: Self-Reflective Reasoning for Biomedical Question Answering

Introduction

Recent Large Language Models (LLMs) have achieved remarkable success in a wide range of tasks, including question answering, text generation, and reasoning. However, these LLMs often struggle with domain-specific tasks, such as biomedical question answering, without extensive pre-training on domain-specific data.

Inspired by Self-RAG and building upon Self-BioRAG, we introduce BioThink, a framework that enhances LLMs for biomedical question answering through self-reflection, context grading, relevance assessment, and utility rating. BioThink uses a novel training approach with GRPO (Group Relative Policy Optimization) to fine-tune LLMs to generate structured outputs that include step-by-step reasoning, concise answers, and self-reflection tokens.

Key Features

Self-Reflective Generation: BioThink generates outputs in a structured format that includes:
- Step-by-step reasoning (<think>)
- Concise answer (<answer>)
- Contextual relevance assessment (<contextual-relevance>)
- Answer utility rating (<answer-utility>)
- Groundness evaluation (<groundness>)
Training with GRPO: We use Group Relative Policy Optimization (GRPO) to train the model, incorporating multiple reward functions to ensure:
- Correctness of the answer
- Accuracy of self-reflection tokens (utility, relevance, groundness)
- Proper XML structure and order of tags
- Faithfulness and relevancy of the answer
Efficiency: The model is trained using QLoRA and Unsloth for efficient fine-tuning.

Quick Start

Install dependencies and run the main BioThink workflow scripts:

make sync
uv run python src/biothink/self_reflection/data_process/process_data.py
uv run python src/biothink/self_reflection/train.py
uv run python src/biothink/self_reflection/inference/inference_biothink_qwen3.py

Training Steps

1. Data Processing

The Self-BioRAG dataset is processed using the script process_data.py. This script extracts questions, answers, and context, and also prepares labels for groundness, relevance, and utility tokens. The processed dataset is available at avnlp/self_biorag_processed.

2. Model Training

The model is trained using the script train.py. The training process involves:

Structured Generation: The model is trained to generate outputs in the following format:

<think>
... step-by-step reasoning ...
</think>
<answer>
... concise answer ...
</answer>
<contextual-relevance>
[Relevant] or [Irrelevant]
</contextual-relevance>
<answer-utility>
[Utility:5] or [Utility:4] or ... [Utility:1]
</answer-utility>
<groundness>
[Fully supported] or [Partially supported] or [No support/Contradictory]
</groundness>

Reward Functions: The training uses GRPO with the following rewards:

Correctness Reward: Measures answer correctness using DeepEval's GEval metric with a custom LLM-as-a-Judge instruction tailored for Bio-Medical Question Answering.
Utility Reward: Ensures the correct Utility token is generated.
Relevance Reward: Ensures the correct Relevance token is generated.
Groundness Reward: Ensures the correct Groundness token is generated.
XML Structure Reward: Checks for the presence and proper opening/closing of all required tags.
Structure Order Reward: Ensures the tags appear in the correct order and that no extra text is present outside the tags.

3. Model

We fine-tune the Qwen-3-1.7B model using GRPO and QLoRA. The trained model is available on Hugging Face: avnlp/BioThink-Qwen3-1.7B.

Training defaults are defined in train_config.py. Set the model choice, dataset source, LoRA parameters, and GRPO settings before launching a run.

4. Evaluation

The model is evaluated using the following metrics:

XML Structure: Checks for the presence of the opening and closing of all reasoning, answer, contextual-relevance, answer-utility, groundness tags.
Utility: Checks that the correct utility token has been generated.
Relevance: Checks that the correct relevance token has been generated.
Groundness: Checks that the correct groundness token has been generated.
Answer Correctness: Checks that the answer is correct using DeepEval's GEval metric with a custom instruction for LLM-as-a-Judge.
Faithfulness: Checks that the answer is faithful to the provided context using DeepEval's Faithfulness LLM-as-a-Judge metric.
Answer Relevancy: Checks that the answer is relevant to the original question using DeepEval's Answer Relevancy LLM-as-a-Judge metric.

Repository Structure

src/biothink/
├── __init__.py
└── self_reflection/
    ├── __init__.py
    ├── data_process/
    │   ├── __init__.py
    │   ├── process_data.py
    │   └── subset_data.py
    ├── evaluation/
    │   ├── __init__.py
    │   ├── evaluate_biothink_qwen3.py
    │   ├── evaluate_qwen3.py
    │   └── metrics.py
    ├── inference/
    │   ├── __init__.py
    │   ├── inference_biothink_qwen3.py
    │   └── inference_qwen3.py
    ├── prompts.py
    ├── reward_functions.py
    ├── train.py
    └── train_config.py

Development

Run the local quality checks with:

make lint-check
make lint-typing
make lint-typos

Security checks are available with:

make security-bandit
make security-audit

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
data/self_reflection_biothink		data/self_reflection_biothink
src/biothink		src/biothink
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
typos.toml		typos.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioThink: Self-Reflective Reasoning for Biomedical Question Answering

Introduction

Key Features

Quick Start

Training Steps

1. Data Processing

2. Model Training

3. Model

4. Evaluation

Repository Structure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BioThink: Self-Reflective Reasoning for Biomedical Question Answering

Introduction

Key Features

Quick Start

Training Steps

1. Data Processing

2. Model Training

3. Model

4. Evaluation

Repository Structure

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages