Skip to content

Commit 51cc339

Browse files
committed
Optimize the DuckDB SQL generation PROMPT to resolve errors related to strftime usage and table name references.
1 parent 1f7cd03 commit 51cc339

File tree

2 files changed

+32
-15
lines changed

2 files changed

+32
-15
lines changed

configs/dbgpt-local-mlx.toml

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,26 @@ persist_path = "pilot/data"
2121
# Model Configurations
2222
[models]
2323
[[models.llms]]
24-
name = "Qwen/Qwen3-0.6B-MLX-4bit"
24+
name = "Qwen3-14B-MLX-4bit"
2525
provider = "mlx"
2626
# If not provided, the model will be downloaded from the Hugging Face model hub
2727
# uncomment the following line to specify the model path in the local file system
2828
# https://huggingface.co/Qwen/Qwen3-0.6B-MLX-4bit
29-
# path = "the-model-path-in-the-local-file-system"
29+
path = "/Users/wendell/MLX/Qwen3-14B-MLX-4bit"
30+
backend = "Qwen3-14B-MLX-4bit"
31+
prompt_template = "你是一个AI助手,请用简洁的语言回答用户问题。"
32+
context_length = 8192
33+
device = "mps"
3034

3135
[[models.embeddings]]
32-
name = "BAAI/bge-large-zh-v1.5"
33-
provider = "hf"
36+
name = "bge-m3:latest"
37+
provider = "proxy/ollama"
38+
api_url = "http://localhost:11434"
39+
api_key = ""
40+
41+
# [[models.embeddings]]
42+
# name = "BAAI/bge-large-zh-v1.5"
43+
# provider = "hf"
3444
# If not provided, the model will be downloaded from the Hugging Face model hub
3545
# uncomment the following line to specify the model path in the local file system
3646
# path = "the-model-path-in-the-local-file-system"

packages/dbgpt-app/src/dbgpt_app/scene/chat_data/chat_excel/excel_analyze/prompt.py

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
especially for columns used in sorting and joining
3535
4. If a column doesn't need an exact value, you can use the ANY_VALUE() function as an \
3636
alternative
37+
5. If the date field is not of DATE or TIMESTAMP type (e.g., it is a string), you must \
38+
use STRPTIME(date, '%Y-%m-%d') to convert it to DATE before using STRFTIME to extract the year or other parts. For example: strftime(strptime(date, '%Y-%m-%d'), '%Y')
3739
``````
3840
Based on the data structure information provided, please answer the user's questions \
3941
through DuckDB SQL data analysis while meeting the following constraints.
@@ -45,14 +47,15 @@
4547
data rendering, and put the type name in the name parameter value of the required \
4648
return format. If you cannot find the most suitable one, use 'Table' as the \
4749
display method. Available data display methods are: {display_type}
48-
3. The table name to be used in the SQL is: {table_name}. Please check your \
50+
In SQL, you must strictly use the table name {table_name} - using any other table name is prohibited!
51+
4. The table name to be used in the SQL is: {table_name}. Please check your \
4952
generated SQL and do not use column names that are not in the data structure
50-
4. Prioritize using data analysis methods to answer. If the user's question does \
53+
5. Prioritize using data analysis methods to answer. If the user's question does \
5154
not involve data analysis content, you can answer based on your understanding
52-
5. DuckDB processes timestamps using dedicated functions (like to_timestamp()) \
53-
instead of direct CAST
54-
6. Please note that comment lines should be on a separate line and not on the same
55-
7. Convert the SQL part in the output content to: \
55+
6. parses string to date/time using STRPTIME(date_string, format_string), \
56+
e.g., STRPTIME('2023.04.24', '%Y.%m.%d')
57+
7. Please note that comment lines should be on a separate line and not on the same
58+
8. Convert the SQL part in the output content to: \
5659
<api-call><name>[display method]</name><args><sql>\
5760
[correct duckdb data analysis sql]</sql></args></api-call> \
5861
format, refer to the return format requirements
@@ -128,6 +131,8 @@
128131
2. 当在 ORDER BY 或窗口函数中引用某个列时,确保该列已在前面的 CTE 或查询中被正确选择
129132
3. 在构建多层 CTE 时,需要确保各层之间的列引用一致性,特别是用于排序和连接的列
130133
4. 如果某列不需要精确值,可以使用 ANY_VALUE() 函数作为替代方案
134+
5. 如果日期字段不是 DATE 或 TIMESTAMP 类型(如为字符串),必须先用 STRPTIME(date, '%Y-%m-%d') \
135+
转换为 DATE,再用 STRFTIME 提取年份等信息。例如:strftime(strptime(date, '%Y-%m-%d'), '%Y')
131136
``````
132137
133138
请基于给你的数据结构信息,在满足下面约束条件下通过\
@@ -138,12 +143,14 @@
138143
2.请从如下给出的展示方式种选择最优的一种用以进行数据渲染,\
139144
将类型名称放入返回要求格式的name参数值中,如果找不到最合适\
140145
的则使用'Table'作为展示方式,可用数据展示方式如下: {display_type}
141-
3.SQL中需要使用的表名是: {table_name},请检查你生成的sql,\
146+
3.SQL 中必须严格使用表名 {table_name},禁止使用任何其他表名!
147+
4.SQL中需要使用的表名是: {table_name},请检查你生成的sql,\
142148
不要使用没在数据结构中的列名
143-
4.优先使用数据分析的方式回答,如果用户问题不涉及数据分析内容,你可以按你的理解进行回答
144-
5.DuckDB 处理时间戳需通过专用函数(如 to_timestamp())而非直接 CAST
145-
6.请注意,注释行要单独一行,不要放在 SQL 语句的同一行中
146-
7.输出内容中sql部分转换为:
149+
5.优先使用数据分析的方式回答,如果用户问题不涉及数据分析内容,你可以按你的理解进行回答
150+
6.解析字符串为日期/时间应使用 STRPTIME(date_string, format_string),\
151+
如 STRPTIME('2023.04.24', '%Y.%m.%d')
152+
7.请注意,注释行要单独一行,不要放在 SQL 语句的同一行中
153+
8.输出内容中sql部分转换为:
147154
<api-call><name>[数据显示方式]</name><args><sql>\
148155
[正确的duckdb数据分析sql]</sql></args></api-call> \
149156
这样的格式,参考返回格式要求

0 commit comments

Comments
 (0)