-
Notifications
You must be signed in to change notification settings - Fork 173
Open
Description
RT,请教各位大佬,如何训练react类型多轮人机对话
我想训练一个能够连续对话的Agent,SFT的方案是构造下面这种多轮数据
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "And what about Germany?"},
{"role": "assistant", "content": "The capital of Germany is Berlin."},
]
}
想请教RL该如何做?使用另一个LLM模拟user的情况下,如何做rollout?如何计算奖励
Metadata
Metadata
Assignees
Labels
No labels