You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking
2
2
3
-
This repository contains the code for our NAACL 2024 paper [Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models](https://arxiv.org/abs/2402.14207) by [Yijia Shao](https://cs.stanford.edu/~shaoyj), [Yucheng Jiang](https://yucheng-jiang.github.io/), Theodore A. Kanell, Peter Xu, [Omar Khattab](https://omarkhattab.com/), and [Monica S. Lam](https://suif.stanford.edu/~lam/).
-[2024/04] We release refactored version of STORM codebase! We define [interface](src/interface.py) for STORM pipeline and reimplement STORM-wiki (check out [`src/storm_wiki`](src/storm_wiki)) to demonstrate how to instantiate the pipeline. We provide API to support customization of different language models and retrieval/search integration.
STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search.
10
17
11
18
While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.
12
19
13
-
**Try out our [live demo](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
20
+
**Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
21
+
22
+
23
+
24
+
## How STORM works
14
25
15
-
## Research Before Writing
16
26
STORM breaks down generating long articles with citations into two steps:
17
27
1.**Pre-writing stage**: The system conducts Internet-based research to collect references and generates an outline.
18
28
2.**Writing stage**: The system uses the outline and references to generate the full-length article with citations.
@@ -24,41 +34,128 @@ STORM identifies the core of automating the research process as automatically co
24
34
1.**Perspective-Guided Question Asking**: Given the input topic, STORM discovers different perspectives by surveying existing articles from similar topics and uses them to control the question-asking process.
25
35
2.**Simulated Conversation**: STORM simulates a conversation between a Wikipedia writer and a topic expert grounded in Internet sources to enable the language model to update its understanding of the topic and ask follow-up questions.
26
36
27
-
Based on the separation of the two stages, STORM is implemented in a highly modular way (see [engine.py](src/engine.py)) using [dspy](https://github.com/stanfordnlp/dspy).
37
+
Based on the separation of the two stages, STORM is implemented in a highly modular way using [dspy](https://github.com/stanfordnlp/dspy).
38
+
28
39
29
40
30
-
## Setup
41
+
## Getting started
31
42
32
-
**We view STORM as an example of automated knowledge curation. We are working on enhancing our codebase to increase its extensibility. Stay tuned!**
43
+
### 1. Setup
33
44
34
-
Below, we provide a quick start guide to run STORM locally to reproduce our experiments.
45
+
Below, we provide a quick start guide to run STORM locally.
35
46
36
47
1. Install the required packages.
37
48
```shell
38
49
conda create -n storm python=3.11
39
50
conda activate storm
40
51
pip install -r requirements.txt
41
52
```
42
-
2. Set up OpenAI API key and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
53
+
2. Set up OpenAI API key (if you want to use OpenAI models to power STORM) and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
43
54
```shell
44
55
# Set up OpenAI API key.
45
-
OPENAI_API_KEY=<your_openai_api_key>
56
+
OPENAI_API_KEY="your_openai_api_key"
46
57
# If you are using the API service provided by OpenAI, include the following line:
47
58
OPENAI_API_TYPE="openai"
48
59
# If you are using the API service provided by Microsoft Azure, include the following lines:
49
60
OPENAI_API_TYPE="azure"
50
-
AZURE_API_BASE=<your_azure_api_base_url>
51
-
AZURE_API_VERSION=<your_azure_api_version>
61
+
AZURE_API_BASE="your_azure_api_base_url"
62
+
AZURE_API_VERSION="your_azure_api_version"
52
63
# Set up You.com search API key.
53
-
YDC_API_KEY=<your_youcom_api_key>
64
+
YDC_API_KEY="your_youcom_api_key"
54
65
```
55
66
56
-
## Paper Experiments
57
-
The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
58
67
68
+
### 2. Running STORM-wiki locally
69
+
70
+
Currently, we provide example scripts under [`examples`](examples) to demonstrate how you can run STORM using different models.
71
+
72
+
**To run STORM with `gpt` family models**: Make sure you have set up the OpenAI API key and run the following command.
73
+
74
+
```
75
+
python scripts/run_storm_wiki_gpt.py \
76
+
--output_dir $OUTPUT_DIR \
77
+
--do-research \
78
+
--do-generate-outline \
79
+
--do-generate-article \
80
+
--do-polish-article
81
+
```
82
+
- `--do-research`: if True, simulate conversation to research the topic; otherwise, load the results.
83
+
- `--do-generate-outline`: If True, generate an outline for the topic; otherwise, load the results.
84
+
- `--do-generate-article`: If True, generate an article for the topic; otherwise, load the results.
85
+
- `--do-polish-article`: If True, polish the article by adding a summarization section and (optionally) removing duplicate content.
86
+
87
+
**To run STORM with `mistral` family models on local VLLM server**: have a VLLM server running with the `Mistral-7B-Instruct-v0.2` model and run the following command.
88
+
89
+
```
90
+
python scripts/run_storm_wiki_mistral.py \
91
+
--url $URL \
92
+
--port $PORT \
93
+
--output_dir $OUTPUT_DIR \
94
+
--do-research \
95
+
--do-generate-outline \
96
+
--do-generate-article \
97
+
--do-polish-article
98
+
```
99
+
- `--url` URL of the VLLM server.
100
+
- `--port` Port of the VLLM server.
101
+
102
+
103
+
104
+
## Customize STORM
105
+
106
+
### Customization of the Pipeline
107
+
108
+
STORM is a knowledge curation engine consisting of 4 modules:
109
+
110
+
1. Knowledge Curation Module: Collects a broad coverage of information about the given topic.
111
+
2. Outline Generation Module: Organizes the collected information by generating a hierarchical outline for the curated knowledge.
112
+
3. Article Generation Module: Populates the generated outline with the collected information.
113
+
4. Article Polishing Module: Refines and enhances the written article for better presentation.
114
+
115
+
The interface for each module is defined in `src/interface.py`, while their implementations are instantiated in `src/storm_wiki/modules/*`. These modules can be customized according to your specific requirements (e.g., generating sections in bullet point format instead of full paragraphs).
116
+
117
+
:star2: **You can share your customization of `Engine` by making PRs to this repo!**
118
+
119
+
### Customization of Retriever Module
120
+
121
+
As a knowledge curation engine, STORM grabs information from the Retriever module. The interface for the Retriever module is defined in [`src/interface.py`]('src/interface.py'). Please consult the interface documentation if you plan to create a new instance or replace the default search engine API. By default, STORM utilizes the You.com search engine API (see `YouRM` in [`src/rm.py`](src/rm.py)).
122
+
123
+
:star2: **PRs for integrating more search engines/retrievers are highly appreciated!**
124
+
125
+
### Customization of Language Models
126
+
127
+
STORM provides the following language model implementations in [`src/lm.py`](src/lm.py):
128
+
129
+
- `OpenAIModel`
130
+
- `ClaudeModel`
131
+
- `VLLMClient`
132
+
- `TGIClient`
133
+
- `TogetherClient`
134
+
135
+
:star2: **PRs for integrating more language model clients are highly appreciated!**
136
+
137
+
:bulb: **For a good practice,**
138
+
139
+
- choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
140
+
- if you need to conduct the actual writing step, choose a more powerful model for `article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
141
+
- for open models, adding one-shot example can help it better follow instructions.
142
+
143
+
Please refer to the scripts in the [`examples`](examples) directory for concrete guidance on customizing the language model used in the pipeline.
144
+
145
+
## Replicate NAACL2024 result
146
+
147
+
Please switch to the branch `NAACL-2024-code-backup`
148
+
149
+
<details>
150
+
<summary>Show me instructions</summary>
151
+
152
+
### Paper Experiments
153
+
154
+
The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
The generated outline will be saved in `{output_dir}/{topic}/storm_gen_outline.txt` and the collected references will be saved in `{output_dir}/{topic}/raw_search_results.json`.
The generated article will be saved in `{output_dir}/{topic}/storm_gen_article.txt` and the references corresponding to citation index will be saved in `{output_dir}/{topic}/url_to_info.json`. If `--do-polish-article` is set, the polished article will be saved in `{output_dir}/{topic}/storm_gen_article_polished.txt`.
96
193
97
-
## Customize the STORM Configurations
194
+
###Customize the STORM Configurations
98
195
We set up the default LLM configuration in `LLMConfigs` in [src/modules/utils.py](src/modules/utils.py). You can use `set_conv_simulator_lm()`,`set_question_asker_lm()`, `set_outline_gen_lm()`, `set_article_gen_lm()`, `set_article_polish_lm()` to override the default configuration. These functions take in an instance from `dspy.dsp.LM` or `dspy.dsp.HFModel`.
99
196
100
-
:bulb: **For a good practice,**
101
-
- choose a cheaper/faster model for`conv_simulator_lm` which is used to split queries, synthesize answersin the conversation.
102
-
- if you need to conduct the actual writing step, choose a more powerful model for`article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
103
197
104
-
105
-
## Automatic Evaluation
198
+
### Automatic Evaluation
106
199
107
200
In our paper, we break down the evaluation into two parts: outline quality and full-length article quality.
108
201
109
-
### Outline Quality
202
+
####Outline Quality
110
203
We introduce *heading soft recall* and *heading entity recall* to evaluate the outline quality. This makes it easier to prototype methods for pre-writing.
111
204
112
205
Run the following command under [./eval](eval) to compute the metrics on FreshWiki dataset:
[eval/eval_article_quality.py](eval/eval_article_quality.py) provides the entry point of evaluating full-length article quality using ROUGE, entity recall, and rubric grading. Run the following command under `eval` to compute the metrics:
The similarity-based metrics (i.e., ROUGE, entity recall, and heading entity recall) are implemented in [eval/metrics.py](eval/metrics.py).
125
218
126
219
For rubric grading, we use the [prometheus-13b-v1.0](https://huggingface.co/kaist-ai/prometheus-13b-v1.0) introduced in [this paper](https://arxiv.org/abs/2310.08491). [eval/evaluation_prometheus.py](eval/evaluation_prometheus.py) provides the entry point of using the metric.
127
220
221
+
</details>
222
+
128
223
## Contributions
129
224
If you have any questions or suggestions, please feel free to open an issue or pull request. We welcome contributions to improve the system and the codebase!
130
225
@@ -140,4 +235,3 @@ Please cite our paper if you use this code or part of it in your work:
140
235
booktitle={Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)}
0 commit comments