|
25 | 25 | """ |
26 | 26 |
|
27 | 27 | # Guidelines for R code output optimization |
28 | | -R_OUTPUT_RECOMMENDATION_PROMPT = """ |
29 | | -R-Specific Guidelines: |
| 28 | +R_SPECIFIC_GUIDELINES = """Guidelines for using the R programming language: |
30 | 29 | 1. Load packages using this format to minimize verbose output: |
31 | 30 | ```r |
32 | 31 | if (!requireNamespace("package_name", quietly = TRUE)) {{ |
33 | 32 | install.packages("package_name") |
34 | 33 | }} |
35 | 34 | suppressPackageStartupMessages(library(package_name)) |
36 | 35 | ``` |
| 36 | +2. You must use the tidyverse wherever possible: dplyr, tidyr, ggplot2, readr, stringr, forcats, purrr, tibble, and lubridate. |
37 | 37 |
|
38 | | -2. For data operations, suppress messages about column name repairs: |
| 38 | +3. All plots must be made using ggplot2. Here is an example of how to make a plot: |
| 39 | +
|
| 40 | + # Create a density scatter plot of FSC-A vs SSC-A |
| 41 | +plot_data <- as.data.frame(dmso_data[, c("FSC-A", "SSC-A")]) |
| 42 | +scatter_plot <- ggplot2::ggplot(plot_data, ggplot2::aes(x = `FSC-A`, y = `SSC-A`)) + |
| 43 | + ggplot2::geom_hex(bins = 100) + |
| 44 | + ggplot2::scale_fill_viridis_c(trans = "log10") + |
| 45 | + ggplot2::labs( |
| 46 | + title = "FSC-A vs SSC-A Density Plot (DMSO Control)", |
| 47 | + x = "FSC-A", |
| 48 | + y = "SSC-A" |
| 49 | + ) + |
| 50 | + ggplot2::theme_minimal() |
| 51 | +
|
| 52 | +3. Use explicit namespace qualification for functions. For example, use dplyr::select() instead of select(). |
| 53 | +
|
| 54 | +4. For data operations, suppress messages about column name repairs: |
39 | 55 | ```r |
40 | 56 | variable_name <- read_excel("<fpath>.csv", col_names = FALSE, .name_repair = "minimal") |
41 | 57 | ``` |
42 | | -
|
43 | | -3. Very important: always use the tidyverse package where possible. |
44 | 58 | """ |
45 | 59 |
|
46 | 60 |
|
|
101 | 115 |
|
102 | 116 | 1. Load Data and Perform Descriptive Statistics: |
103 | 117 | <analysis_planning> |
104 | | -- Identify which data files are most relevant to resolving the task. List these files. |
| 118 | +- Identify which data files are most relevant to resolving the task. |
105 | 119 | - Plan how to load these files efficiently in {language}. |
106 | 120 | - List the specific descriptive statistics you plan to use (e.g., summary(), str(), head()). |
107 | 121 | - Consider potential issues like missing data or unexpected formats. How will you handle each? |
|
197 | 211 | {CHAIN_OF_THOUGHT_AGNOSTIC} |
198 | 212 | {SUBMIT_ANSWER_HYPOTHESIS} |
199 | 213 | {GENERAL_NOTEBOOK_GUIDELINES} |
200 | | -{R_OUTPUT_RECOMMENDATION_PROMPT} |
| 214 | +{R_SPECIFIC_GUIDELINES} |
201 | 215 | """ |
202 | 216 | # MCQ |
203 | 217 | MCQ_PROMPT_TEMPLATE = f""" |
|
209 | 223 | {CHAIN_OF_THOUGHT_AGNOSTIC} |
210 | 224 | {SUBMIT_ANSWER_MCQ} |
211 | 225 | {GENERAL_NOTEBOOK_GUIDELINES} |
212 | | -{R_OUTPUT_RECOMMENDATION_PROMPT} |
| 226 | +{R_SPECIFIC_GUIDELINES} |
213 | 227 | """ |
214 | 228 | # Open answer |
215 | 229 | OPEN_PROMPT_TEMPLATE = f""" |
|
222 | 236 | {CHAIN_OF_THOUGHT_AGNOSTIC} |
223 | 237 | {SUBMIT_ANSWER_OPEN} |
224 | 238 | {GENERAL_NOTEBOOK_GUIDELINES} |
225 | | -{R_OUTPUT_RECOMMENDATION_PROMPT} |
| 239 | +{R_SPECIFIC_GUIDELINES} |
226 | 240 | """ |
0 commit comments