Skip to content

能否拨冗查看下这个问题TypeError: output tensor must have the same type as input tensor #308

@ZephryLiang

Description

@ZephryLiang

您好,非常不好意思打扰您,能否抽空帮忙看下这个问题
问题运行train.sh脚本报错:
尝试打印dtype

def training_step( 加入打印代码 
/transformers/trainer.py", line 2548
Input tensor input_ids dtype: torch.int64
Input tensor attention_mask dtype: torch.int64
Input tensor labels dtype: torch.int64
Loss tensor dtype: torch.float32

TypeError:output tensor must have the same type as input tensor
包介绍:
torch 2.6.0 python 3.11.11
显卡:4*nvidia 4090
脚本:
--quantization_bit 4 \ zeros去掉 因为报错
deepspeed --num_gpus 4 dbgpt_hub_sql/train/sft_train.py
--deepspeed dbgpt_hub_sql/configs/ds_config_stage3.json
--model_name_or_path codellama/CodeLlama-7B-Instruct-hf
--do_train
--dataset example_text2sql_train
--max_source_length 1024
--max_target_length 512
--template llama2
--finetuning_type lora
--lora_rank 64
--lora_alpha 32
--lora_target q_proj,v_proj
--output_dir dbgpt_hub_sql/output/adapter/llama2-13b-qlora_1024_epoch1_debug1008_withDeepseed_mulitCard
--overwrite_cache
--overwrite_output_dir
--per_device_train_batch_size 1
--gradient_accumulation_steps 16
--lr_scheduler_type cosine_with_restarts
--logging_steps 25
--save_steps 20
--learning_rate 2e-4
--num_train_epochs 0.1
--plot_loss
--bf16 2>&1 | tee ${train_log}
报错代码:

File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/transformers/trainer.py", line 2241, in train
return **inner_training_loop**(----->

File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
----> tr_loss_step = **self.training_step**(model, inputs, num_items_in_batch)

 File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/transformers/trainer.py", line 3745, in training_step
 **self.accelerator.backward**(loss, **kwargs)
File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/accelerate/accelerator.py", line 2321, in backward
**self.deepspeed_engine_wrapped.backward**(loss, **kwargs)
 File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 275, in backward
**self.engine.step()**
 File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 2249, in step
 **self._take_model_step(lr_kwargs)**
 
**中间省略**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1530, in _all_gather
[rank0]:     **self._allgather_params_coalesced(all_gather_nonquantize_list, hierarchy, quantize=False)**
[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1840, in _allgather_params_coalesced
[rank0]:     h = dist.all_gather_into_tensor(allgather_params[param_idx],

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
[rank0]:     return func(*args, **kwargs)

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor
[rank0]:     return **cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op)**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 220, in all_gather_into_tensor
[rank0]:     return **self.all_gather_function(output_tensor=output_tensor,**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
[rank0]:     **return func(*args, **kwargs)**

[rank0]:   File "/home/desir/anaconda3/envs/dbgpt_hub/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 3798, in all_gather_into_tensor
[rank0]:     work = **group._allgather_base(output_tensor, input_tensor, opts)**

[rank0]: **TypeError: output tensor must have the same type as input tensor**

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions