Skip to content

Commit 7002321

Browse files
committed
Minor prompting changes
1 parent 770880b commit 7002321

File tree

6 files changed

+30
-13
lines changed

6 files changed

+30
-13
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ repos:
3838
- id: codespell
3939
additional_dependencies: [".[toml]"]
4040
exclude_types: [jupyter]
41+
exclude: ".*\\.csv$"
4142
- repo: https://github.com/pappasam/toml-sort
4243
rev: v0.24.2
4344
hooks:

src/fhda/notebook_env.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ async def close(self):
112112

113113
class NBEnvironment(Environment[NBEnvironmentState]):
114114
NOTEBOOK_NAME: ClassVar[str] = "notebook.ipynb"
115-
EXEC_TIMEOUT: ClassVar[float | None] = 300.0
115+
EXEC_TIMEOUT: ClassVar[float | None] = 600.0
116116

117117
state: NBEnvironmentState
118118

src/fhda/prompts.py

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,36 @@
2525
"""
2626

2727
# Guidelines for R code output optimization
28-
R_OUTPUT_RECOMMENDATION_PROMPT = """
29-
R-Specific Guidelines:
28+
R_SPECIFIC_GUIDELINES = """Guidelines for using the R programming language:
3029
1. Load packages using this format to minimize verbose output:
3130
```r
3231
if (!requireNamespace("package_name", quietly = TRUE)) {{
3332
install.packages("package_name")
3433
}}
3534
suppressPackageStartupMessages(library(package_name))
3635
```
36+
2. You must use the tidyverse wherever possible: dplyr, tidyr, ggplot2, readr, stringr, forcats, purrr, tibble, and lubridate.
3737
38-
2. For data operations, suppress messages about column name repairs:
38+
3. All plots must be made using ggplot2. Here is an example of how to make a plot:
39+
40+
# Create a density scatter plot of FSC-A vs SSC-A
41+
plot_data <- as.data.frame(dmso_data[, c("FSC-A", "SSC-A")])
42+
scatter_plot <- ggplot2::ggplot(plot_data, ggplot2::aes(x = `FSC-A`, y = `SSC-A`)) +
43+
ggplot2::geom_hex(bins = 100) +
44+
ggplot2::scale_fill_viridis_c(trans = "log10") +
45+
ggplot2::labs(
46+
title = "FSC-A vs SSC-A Density Plot (DMSO Control)",
47+
x = "FSC-A",
48+
y = "SSC-A"
49+
) +
50+
ggplot2::theme_minimal()
51+
52+
3. Use explicit namespace qualification for functions. For example, use dplyr::select() instead of select().
53+
54+
4. For data operations, suppress messages about column name repairs:
3955
```r
4056
variable_name <- read_excel("<fpath>.csv", col_names = FALSE, .name_repair = "minimal")
4157
```
42-
43-
3. Very important: always use the tidyverse package where possible.
4458
"""
4559

4660

@@ -101,7 +115,7 @@
101115
102116
1. Load Data and Perform Descriptive Statistics:
103117
<analysis_planning>
104-
- Identify which data files are most relevant to resolving the task. List these files.
118+
- Identify which data files are most relevant to resolving the task.
105119
- Plan how to load these files efficiently in {language}.
106120
- List the specific descriptive statistics you plan to use (e.g., summary(), str(), head()).
107121
- Consider potential issues like missing data or unexpected formats. How will you handle each?
@@ -197,7 +211,7 @@
197211
{CHAIN_OF_THOUGHT_AGNOSTIC}
198212
{SUBMIT_ANSWER_HYPOTHESIS}
199213
{GENERAL_NOTEBOOK_GUIDELINES}
200-
{R_OUTPUT_RECOMMENDATION_PROMPT}
214+
{R_SPECIFIC_GUIDELINES}
201215
"""
202216
# MCQ
203217
MCQ_PROMPT_TEMPLATE = f"""
@@ -209,7 +223,7 @@
209223
{CHAIN_OF_THOUGHT_AGNOSTIC}
210224
{SUBMIT_ANSWER_MCQ}
211225
{GENERAL_NOTEBOOK_GUIDELINES}
212-
{R_OUTPUT_RECOMMENDATION_PROMPT}
226+
{R_SPECIFIC_GUIDELINES}
213227
"""
214228
# Open answer
215229
OPEN_PROMPT_TEMPLATE = f"""
@@ -222,5 +236,5 @@
222236
{CHAIN_OF_THOUGHT_AGNOSTIC}
223237
{SUBMIT_ANSWER_OPEN}
224238
{GENERAL_NOTEBOOK_GUIDELINES}
225-
{R_OUTPUT_RECOMMENDATION_PROMPT}
239+
{R_SPECIFIC_GUIDELINES}
226240
"""

tutorial/example.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
" {prompts.GENERAL_NOTEBOOK_GUIDELINES.format(language=language.name)}\"\"\"\n",
6565
"\n",
6666
" if language == NBLanguage.R:\n",
67-
" augmented_task += f\"\\n{prompts.R_OUTPUT_RECOMMENDATION_PROMPT}\"\n",
67+
" augmented_task += f\"\\n{prompts.R_SPECIFIC_GUIDELINES}\"\n",
6868
"\n",
6969
" dae = DataAnalysisEnv(\n",
7070
" problem_id=f\"data-analysis-task-{task_hash}\",\n",

tutorial/platform_api.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,9 @@
5555
"source": [
5656
"# Load your dataset – note you only have to do this once\n",
5757
"# File path can be an absolute path or a relative path to either a directory or a file containing the dataset\n",
58-
"client.upload_file(JOB_NAME, file_path=\"dataset\", upload_id=UPLOAD_ID)"
58+
"client.upload_file(\n",
59+
" JOB_NAME, file_path=\"datasets/brain_size_data.csv\", upload_id=UPLOAD_ID\n",
60+
")"
5961
]
6062
},
6163
{
@@ -92,7 +94,7 @@
9294
"\n",
9395
"# This is extra R prompting to avoid long R output blocks – also feel free to discard this\n",
9496
"if LANGUAGE == \"R\":\n",
95-
" task += f\"\\n{prompts.R_OUTPUT_RECOMMENDATION_PROMPT}\""
97+
" task += f\"\\n{prompts.R_SPECIFIC_GUIDELINES}\""
9698
]
9799
},
98100
{

0 commit comments

Comments
 (0)