Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request / 拉取请求
What does this PR do? / 这个PR做了什么?
对齐 HLLM 实现与官方 ByteDance HLLM 的提示词格式和训练流程
本 PR 完成了以下工作:
"Compress the following sentence into embedding: "Type of Change / 变更类型
Related Issues / 相关Issues
主动的代码对齐工作,无关联 issue
How to Test / 如何测试
1. 验证数据预处理(MovieLens)
cd examples/generative/data/ml-1m python preprocess_hllm_data.py --model_type tinyllama --device cuda2. 验证训练脚本(Amazon Books)
cd examples/generative python run_hllm_amazon_books.py --model_type tinyllama --device cuda --epochs 13. 验证模块导入
Checklist / 检查清单
python config/format_code.py) / 代码遵循项目风格(运行了格式化脚本)Additional Notes / 附加说明
主要变更
1. 提示词格式对齐 ✅
修改前:
修改后(官方格式):
2. 训练脚本修复 ✅
item_llm_path,user_llm_path,item_textsitem_embeddingsDEFAULT_CONFIG官方配置参考3. 文档更新 ✅
对齐度评分
验证结果
影响范围
修改的文件:
examples/generative/data/ml-1m/preprocess_hllm_data.pyexamples/generative/data/amazon-books/preprocess_amazon_books_hllm.pyexamples/generative/run_hllm_amazon_books.pyexamples/generative/run_hllm_movielens.pytorch_rechub/models/generative/hllm.pydocs/zh/blog/hllm_reproduction.mddocs/en/blog/hllm_reproduction.md向后兼容性: