With 144GB of Vram what is biggest model we can train from scratch using tinybox. Will it be possible to train model similar to lets say llama 3 8B from scratch using this ? Here are some model params
llama3(
vocab_size=128_256,
num_layers=32,
num_heads=32,
num_kv_heads=8,
embed_dim=4096,
max_seq_len=8192,
intermediate_dim=14336,
attn_dropout=0.0,
norm_eps=1e-5,
rope_base=500000.0,
)
Now we need to also consider block size, batch size etc