Incorrect scheduler stepping and label dtype handling in training loop

While reviewing the training pipeline, I noticed two small issues that may affect training stability:

1. The scheduler is stepped using `scheduler.step(loss)`, but schedulers such as `CosineAnnealingWarmRestarts` do not require the loss value and expect a simple `scheduler.step()` call.
2. Labels are converted using `labels.type(torch.LongTensor).to(device)`, which can create CPU tensors before moving to the target device. Using `labels.long().to(device)` is safer and avoids unnecessary tensor transfers.

I can submit a small fix addressing both points if this sounds reasonable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect scheduler stepping and label dtype handling in training loop #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect scheduler stepping and label dtype handling in training loop #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions