Skip to content

TUI Dashboard for Kaun Training Monitoring #53

@tmattio

Description

@tmattio

Background

Kaun is our neural network training library that currently lacks real-time monitoring capabilities. Training progress, loss curves, and metrics are only visible through print statements or post-training analysis. We need a TensorBoard-like experience in the terminal for monitoring training runs.

Objective

Create a decoupled monitoring system with:

  1. Logging layer: Training runs write metrics to filesystem in a structured format
  2. TUI Dashboard: Separate process that reads logs and visualizes them using Mosaic

This separation allows:

  • Multiple monitoring frontends (TUI, web, notebooks)
  • Monitoring remote training runs
  • Post-hoc analysis of completed runs

Design Considerations

Logging Format:

  • JSON lines or binary format for efficiency
  • Directory structure: runs/<run_id>/metrics.jsonl
  • Atomic writes to handle concurrent access
  • Consider rotation/compression for long runs
  • Consider logging in the same format as tensorboard if appropriate

Dashboard Features:

  • Training/validation loss curves
  • Learning rate schedules
  • Metrics (accuracy, perplexity, etc.)
  • Training speed (steps/sec, tokens/sec)
  • Multiple run comparison
  • Real-time file watching

Success Criteria

  • Logging has minimal impact on training performance
  • Dashboard can monitor live or completed runs
  • Smooth UI updates without terminal flickering
  • Works with existing Kaun examples (MNIST)
  • Clear separation between logging and visualization

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions