Skip to content

Commit 5d21c39

Browse files
Merge pull request #27 from stanford-oval/refactored-codebase-sync
Refactored codebase sync
2 parents 72d85c9 + 85b6479 commit 5d21c39

28 files changed

+3571
-9505
lines changed

README.md

Lines changed: 120 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,28 @@
11
# STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking
22

3-
This repository contains the code for our NAACL 2024 paper [Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models](https://arxiv.org/abs/2402.14207) by [Yijia Shao](https://cs.stanford.edu/~shaoyj), [Yucheng Jiang](https://yucheng-jiang.github.io/), Theodore A. Kanell, Peter Xu, [Omar Khattab](https://omarkhattab.com/), and [Monica S. Lam](https://suif.stanford.edu/~lam/).
3+
<p align="center">
4+
| <a href="http://storm.genie.stanford.edu"><b>Research preview</b></a> | <a href="https://arxiv.org/abs/2402.14207"><b>Paper</b></a> | <b>Documentation (WIP)</b> |
5+
6+
7+
**Latest News** 🔥
8+
9+
- [2024/04] We release refactored version of STORM codebase! We define [interface](src/interface.py) for STORM pipeline and reimplement STORM-wiki (check out [`src/storm_wiki`](src/storm_wiki)) to demonstrate how to instantiate the pipeline. We provide API to support customization of different language models and retrieval/search integration.
410

511
## Overview [(Try STORM now!)](https://storm.genie.stanford.edu/)
12+
613
<p align="center">
714
<img src="assets/overview.png" style="width: 90%; height: auto;">
815
</p>
916
STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search.
1017

1118
While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.
1219

13-
**Try out our [live demo](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
20+
**Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
21+
22+
23+
24+
## How STORM works
1425

15-
## Research Before Writing
1626
STORM breaks down generating long articles with citations into two steps:
1727
1. **Pre-writing stage**: The system conducts Internet-based research to collect references and generates an outline.
1828
2. **Writing stage**: The system uses the outline and references to generate the full-length article with citations.
@@ -24,41 +34,128 @@ STORM identifies the core of automating the research process as automatically co
2434
1. **Perspective-Guided Question Asking**: Given the input topic, STORM discovers different perspectives by surveying existing articles from similar topics and uses them to control the question-asking process.
2535
2. **Simulated Conversation**: STORM simulates a conversation between a Wikipedia writer and a topic expert grounded in Internet sources to enable the language model to update its understanding of the topic and ask follow-up questions.
2636

27-
Based on the separation of the two stages, STORM is implemented in a highly modular way (see [engine.py](src/engine.py)) using [dspy](https://github.com/stanfordnlp/dspy).
37+
Based on the separation of the two stages, STORM is implemented in a highly modular way using [dspy](https://github.com/stanfordnlp/dspy).
38+
2839

2940

30-
## Setup
41+
## Getting started
3142

32-
**We view STORM as an example of automated knowledge curation. We are working on enhancing our codebase to increase its extensibility. Stay tuned!**
43+
### 1. Setup
3344

34-
Below, we provide a quick start guide to run STORM locally to reproduce our experiments.
45+
Below, we provide a quick start guide to run STORM locally.
3546

3647
1. Install the required packages.
3748
```shell
3849
conda create -n storm python=3.11
3950
conda activate storm
4051
pip install -r requirements.txt
4152
```
42-
2. Set up OpenAI API key and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
53+
2. Set up OpenAI API key (if you want to use OpenAI models to power STORM) and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
4354
```shell
4455
# Set up OpenAI API key.
45-
OPENAI_API_KEY=<your_openai_api_key>
56+
OPENAI_API_KEY="your_openai_api_key"
4657
# If you are using the API service provided by OpenAI, include the following line:
4758
OPENAI_API_TYPE="openai"
4859
# If you are using the API service provided by Microsoft Azure, include the following lines:
4960
OPENAI_API_TYPE="azure"
50-
AZURE_API_BASE=<your_azure_api_base_url>
51-
AZURE_API_VERSION=<your_azure_api_version>
61+
AZURE_API_BASE="your_azure_api_base_url"
62+
AZURE_API_VERSION="your_azure_api_version"
5263
# Set up You.com search API key.
53-
YDC_API_KEY=<your_youcom_api_key>
64+
YDC_API_KEY="your_youcom_api_key"
5465
```
5566

56-
## Paper Experiments
57-
The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
5867

68+
### 2. Running STORM-wiki locally
69+
70+
Currently, we provide example scripts under [`examples`](examples) to demonstrate how you can run STORM using different models.
71+
72+
**To run STORM with `gpt` family models**: Make sure you have set up the OpenAI API key and run the following command.
73+
74+
```
75+
python scripts/run_storm_wiki_gpt.py \
76+
--output_dir $OUTPUT_DIR \
77+
--do-research \
78+
--do-generate-outline \
79+
--do-generate-article \
80+
--do-polish-article
81+
```
82+
- `--do-research`: if True, simulate conversation to research the topic; otherwise, load the results.
83+
- `--do-generate-outline`: If True, generate an outline for the topic; otherwise, load the results.
84+
- `--do-generate-article`: If True, generate an article for the topic; otherwise, load the results.
85+
- `--do-polish-article`: If True, polish the article by adding a summarization section and (optionally) removing duplicate content.
86+
87+
**To run STORM with `mistral` family models on local VLLM server**: have a VLLM server running with the `Mistral-7B-Instruct-v0.2` model and run the following command.
88+
89+
```
90+
python scripts/run_storm_wiki_mistral.py \
91+
--url $URL \
92+
--port $PORT \
93+
--output_dir $OUTPUT_DIR \
94+
--do-research \
95+
--do-generate-outline \
96+
--do-generate-article \
97+
--do-polish-article
98+
```
99+
- `--url` URL of the VLLM server.
100+
- `--port` Port of the VLLM server.
101+
102+
103+
104+
## Customize STORM
105+
106+
### Customization of the Pipeline
107+
108+
STORM is a knowledge curation engine consisting of 4 modules:
109+
110+
1. Knowledge Curation Module: Collects a broad coverage of information about the given topic.
111+
2. Outline Generation Module: Organizes the collected information by generating a hierarchical outline for the curated knowledge.
112+
3. Article Generation Module: Populates the generated outline with the collected information.
113+
4. Article Polishing Module: Refines and enhances the written article for better presentation.
114+
115+
The interface for each module is defined in `src/interface.py`, while their implementations are instantiated in `src/storm_wiki/modules/*`. These modules can be customized according to your specific requirements (e.g., generating sections in bullet point format instead of full paragraphs).
116+
117+
:star2: **You can share your customization of `Engine` by making PRs to this repo!**
118+
119+
### Customization of Retriever Module
120+
121+
As a knowledge curation engine, STORM grabs information from the Retriever module. The interface for the Retriever module is defined in [`src/interface.py`]('src/interface.py'). Please consult the interface documentation if you plan to create a new instance or replace the default search engine API. By default, STORM utilizes the You.com search engine API (see `YouRM` in [`src/rm.py`](src/rm.py)).
122+
123+
:star2: **PRs for integrating more search engines/retrievers are highly appreciated!**
124+
125+
### Customization of Language Models
126+
127+
STORM provides the following language model implementations in [`src/lm.py`](src/lm.py):
128+
129+
- `OpenAIModel`
130+
- `ClaudeModel`
131+
- `VLLMClient`
132+
- `TGIClient`
133+
- `TogetherClient`
134+
135+
:star2: **PRs for integrating more language model clients are highly appreciated!**
136+
137+
:bulb: **For a good practice,**
138+
139+
- choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
140+
- if you need to conduct the actual writing step, choose a more powerful model for `article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
141+
- for open models, adding one-shot example can help it better follow instructions.
142+
143+
Please refer to the scripts in the [`examples`](examples) directory for concrete guidance on customizing the language model used in the pipeline.
144+
145+
## Replicate NAACL2024 result
146+
147+
Please switch to the branch `NAACL-2024-code-backup`
148+
149+
<details>
150+
<summary>Show me instructions</summary>
151+
152+
### Paper Experiments
153+
154+
The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
155+
59156
Run the following commands under [./src](src).
60157
61-
### Pre-writing Stage
158+
#### Pre-writing Stage
62159
For batch experiment on FreshWiki dataset:
63160
```shell
64161
python -m scripts.run_prewriting --input-source file --input-path ../FreshWiki/topic_list.csv --engine gpt-4 --do-research --max-conv-turn 5 --max-perspective 5
@@ -79,7 +176,7 @@ python -m scripts.run_prewriting --input-source console --engine gpt-4 --max-con
79176
The generated outline will be saved in `{output_dir}/{topic}/storm_gen_outline.txt` and the collected references will be saved in `{output_dir}/{topic}/raw_search_results.json`.
80177

81178

82-
### Writing Stage
179+
#### Writing Stage
83180
For batch experiment on FreshWiki dataset:
84181
```shell
85182
python -m scripts.run_writing --input-source file --input-path ../FreshWiki/topic_list.csv --engine gpt-4 --do-polish-article --remove-duplicate
@@ -94,37 +191,35 @@ python -m scripts.run_writing --input-source console --engine gpt-4 --do-polish-
94191

95192
The generated article will be saved in `{output_dir}/{topic}/storm_gen_article.txt` and the references corresponding to citation index will be saved in `{output_dir}/{topic}/url_to_info.json`. If `--do-polish-article` is set, the polished article will be saved in `{output_dir}/{topic}/storm_gen_article_polished.txt`.
96193

97-
## Customize the STORM Configurations
194+
### Customize the STORM Configurations
98195
We set up the default LLM configuration in `LLMConfigs` in [src/modules/utils.py](src/modules/utils.py). You can use `set_conv_simulator_lm()`,`set_question_asker_lm()`, `set_outline_gen_lm()`, `set_article_gen_lm()`, `set_article_polish_lm()` to override the default configuration. These functions take in an instance from `dspy.dsp.LM` or `dspy.dsp.HFModel`.
99196

100-
:bulb: **For a good practice,**
101-
- choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
102-
- if you need to conduct the actual writing step, choose a more powerful model for `article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
103197

104-
105-
## Automatic Evaluation
198+
### Automatic Evaluation
106199

107200
In our paper, we break down the evaluation into two parts: outline quality and full-length article quality.
108201

109-
### Outline Quality
202+
#### Outline Quality
110203
We introduce *heading soft recall* and *heading entity recall* to evaluate the outline quality. This makes it easier to prototype methods for pre-writing.
111204

112205
Run the following command under [./eval](eval) to compute the metrics on FreshWiki dataset:
113206
```shell
114207
python eval_outline_quality.py --input-path ../FreshWiki/topic_list.csv --gt-dir ../FreshWiki --pred-dir ../results --pred-file-name storm_gen_outline.txt --result-output-path ../results/storm_outline_quality.csv
115208
```
116209

117-
### Full-length Article Quality
210+
#### Full-length Article Quality
118211
[eval/eval_article_quality.py](eval/eval_article_quality.py) provides the entry point of evaluating full-length article quality using ROUGE, entity recall, and rubric grading. Run the following command under `eval` to compute the metrics:
119212
```shell
120213
python eval_article_quality.py --input-path ../FreshWiki/topic_list.csv --gt-dir ../FreshWiki --pred-dir ../results --gt-dir ../FreshWiki --output-dir ../results/storm_article_eval_results --pred-file-name storm_gen_article_polished.txt
121214
```
122215

123-
### Use the Metric Yourself
216+
#### Use the Metric Yourself
124217
The similarity-based metrics (i.e., ROUGE, entity recall, and heading entity recall) are implemented in [eval/metrics.py](eval/metrics.py).
125218

126219
For rubric grading, we use the [prometheus-13b-v1.0](https://huggingface.co/kaist-ai/prometheus-13b-v1.0) introduced in [this paper](https://arxiv.org/abs/2310.08491). [eval/evaluation_prometheus.py](eval/evaluation_prometheus.py) provides the entry point of using the metric.
127220

221+
</details>
222+
128223
## Contributions
129224
If you have any questions or suggestions, please feel free to open an issue or pull request. We welcome contributions to improve the system and the codebase!
130225

@@ -140,4 +235,3 @@ Please cite our paper if you use this code or part of it in your work:
140235
booktitle={Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)}
141236
}
142237
```
143-

examples/run_storm_wiki_gpt.py

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
"""
2+
STORM Wiki pipeline powered by GPT-3.5/4 and You.com search engine.
3+
You need to set up the following environment variables to run this script:
4+
- OPENAI_API_KEY: OpenAI API key
5+
- OPENAI_API_TYPE: OpenAI API type (e.g., 'openai' or 'azure')
6+
- AZURE_API_BASE: Azure API base URL if using Azure API
7+
- AZURE_API_VERSION: Azure API version if using Azure API
8+
- YDC_API_KEY: You.com API key
9+
10+
Output will be structured as below
11+
args.output_dir/
12+
topic_name/ # topic_name will follow convention of underscore-connected topic name w/o space and slash
13+
conversation_log.json # Log of information-seeking conversation
14+
raw_search_results.json # Raw search results from search engine
15+
direct_gen_outline.txt # Outline directly generated with LLM's parametric knowledge
16+
storm_gen_outline.txt # Outline refined with collected information
17+
url_to_info.json # Sources that are used in the final article
18+
storm_gen_article.txt # Final article generated
19+
storm_gen_article_polished.txt # Polished final article (if args.do_polish_article is True)
20+
"""
21+
22+
import os
23+
import sys
24+
from argparse import ArgumentParser
25+
26+
sys.path.append('./src')
27+
from lm import OpenAIModel
28+
from storm_wiki.engine import STORMWikiRunnerArguments, STORMWikiRunner, STORMWikiLMConfigs
29+
from utils import load_api_key
30+
31+
32+
def main(args):
33+
load_api_key(toml_file_path='secrets.toml')
34+
lm_configs = STORMWikiLMConfigs()
35+
openai_kwargs = {
36+
'api_key': os.getenv("OPENAI_API_KEY"),
37+
'api_provider': os.getenv('OPENAI_API_TYPE'),
38+
'temperature': 1.0,
39+
'top_p': 0.9,
40+
'api_base': os.getenv('AZURE_API_BASE'),
41+
'api_version': os.getenv('AZURE_API_VERSION'),
42+
}
43+
44+
# STORM is a LM system so different components can be powered by different models.
45+
# For a good balance between cost and quality, you can choose a cheaper/faster model for conv_simulator_lm
46+
# which is used to split queries, synthesize answers in the conversation. We recommend using stronger models
47+
# for outline_gen_lm which is responsible for organizing the collected information, and article_gen_lm
48+
# which is responsible for generating sections with citations.
49+
conv_simulator_lm = OpenAIModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
50+
question_asker_lm = OpenAIModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
51+
outline_gen_lm = OpenAIModel(model='gpt-4-0125-preview', max_tokens=400, **openai_kwargs)
52+
article_gen_lm = OpenAIModel(model='gpt-4-0125-preview', max_tokens=700, **openai_kwargs)
53+
article_polish_lm = OpenAIModel(model='gpt-4-0125-preview', max_tokens=4000, **openai_kwargs)
54+
55+
lm_configs.set_conv_simulator_lm(conv_simulator_lm)
56+
lm_configs.set_question_asker_lm(question_asker_lm)
57+
lm_configs.set_outline_gen_lm(outline_gen_lm)
58+
lm_configs.set_article_gen_lm(article_gen_lm)
59+
lm_configs.set_article_polish_lm(article_polish_lm)
60+
61+
engine_args = STORMWikiRunnerArguments(
62+
output_dir=args.output_dir,
63+
max_conv_turn=args.max_conv_turn,
64+
max_perspective=args.max_perspective,
65+
search_top_k=args.search_top_k,
66+
)
67+
runner = STORMWikiRunner(engine_args, lm_configs)
68+
69+
topic = input('Topic: ')
70+
runner.run(
71+
topic=topic,
72+
do_research=args.do_research,
73+
do_generate_outline=args.do_generate_outline,
74+
do_generate_article=args.do_generate_article,
75+
do_polish_article=args.do_polish_article,
76+
)
77+
runner.post_run()
78+
runner.summary()
79+
80+
81+
if __name__ == '__main__':
82+
parser = ArgumentParser()
83+
# global arguments
84+
parser.add_argument('--output-dir', type=str, default='./results/gpt',
85+
help='Directory to store the outputs.')
86+
parser.add_argument('--max_thread_num', type=int, default=3,
87+
help='Maximum number of threads to use. The information seeking part and the article generation'
88+
'part can speed up by using multiple threads. Consider reducing it if keep getting '
89+
'"Exceed rate limit" error when calling LM API.')
90+
# stage of the pipeline
91+
parser.add_argument('--do-research', action='store_true',
92+
help='If True, simulate conversation to research the topic; otherwise, load the results.')
93+
parser.add_argument('--do-generate-outline', action='store_true',
94+
help='If True, generate an outline for the topic; otherwise, load the results.')
95+
parser.add_argument('--do-generate-article', action='store_true',
96+
help='If True, generate an article for the topic; otherwise, load the results.')
97+
parser.add_argument('--do-polish-article', action='store_true',
98+
help='If True, polish the article by adding a summarization section and (optionally) removing '
99+
'duplicate content.')
100+
# hyperparameters for the pre-writing stage
101+
parser.add_argument('--max-conv-turn', type=int, default=3,
102+
help='Maximum number of questions in conversational question asking.')
103+
parser.add_argument('--max-perspective', type=int, default=3,
104+
help='Maximum number of perspectives to consider in perspective-guided question asking.')
105+
parser.add_argument('--search-top-k', type=int, default=3,
106+
help='Top k search results to consider for each search query.')
107+
# hyperparameters for the writing stage
108+
parser.add_argument('--retrieve-top-k', type=int, default=3,
109+
help='Top k collected references for each section title.')
110+
parser.add_argument('--remove-duplicate', action='store_true',
111+
help='If True, remove duplicate content from the article.')
112+
113+
main(parser.parse_args())

0 commit comments

Comments
 (0)