Skip to content

请教:如何训练react类型多轮人机对话 #280

@Sakurakdx

Description

@Sakurakdx

RT,请教各位大佬,如何训练react类型多轮人机对话

我想训练一个能够连续对话的Agent,SFT的方案是构造下面这种多轮数据
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "And what about Germany?"},
{"role": "assistant", "content": "The capital of Germany is Berlin."},
]
}

想请教RL该如何做?使用另一个LLM模拟user的情况下,如何做rollout?如何计算奖励

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions