Skip to content

Incorrect scheduler stepping and label dtype handling in training loop #152

@SarthakJagota

Description

@SarthakJagota

While reviewing the training pipeline, I noticed two small issues that may affect training stability:

  1. The scheduler is stepped using scheduler.step(loss), but schedulers such as CosineAnnealingWarmRestarts do not require the loss value and expect a simple scheduler.step() call.
  2. Labels are converted using labels.type(torch.LongTensor).to(device), which can create CPU tensors before moving to the target device. Using labels.long().to(device) is safer and avoids unnecessary tensor transfers.

I can submit a small fix addressing both points if this sounds reasonable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions