Skip to content

Commit 0d70da5

Browse files
committed
annotate figures
1 parent f245cba commit 0d70da5

File tree

4 files changed

+3
-2
lines changed

4 files changed

+3
-2
lines changed

torchao_float8/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,17 +36,18 @@ The 0-th inference iteration is profiled with CUDA memory snapshot. The snapshot
3636
<img src="figures/none_whole_timeline.png" alt="Baseline Whole Timeline" width="800">
3737
<p><strong>Figure 1:</strong> Baseline Whole Timeline</p>
3838
</div>
39-
4039
<div align="center">
4140
<img src="figures/float8dq_whole_timeline.png" alt="FP8 Weights and Activations DQ Whole Timeline" width="800">
4241
<p><strong>Figure 2:</strong> FP8 Weights and Activations DQ Whole Timeline</p>
4342
</div>
44-
4543
<div align="center">
4644
<img src="figures/float8wo_whole_timeline.png" alt="FP8 Weights Only Static Quantization Whole Timeline" width="800">
4745
<p><strong>Figure 3:</strong> FP8 Weights Only Static Quantization Whole Timeline</p>
4846
</div>
4947

48+
On first look, comparing the whole timelines in Figures 1, 2, and 3, we can notice that they all have some blocks of memory in the middle of the timeline (encircled). Also, Float8WO has multiple spikes in memory that are not present in the other two snapshots (marked by arrow).
49+
50+
5051
## Torch Execution Trace
5152
### Baseline
5253

-4.16 KB
Loading
-8.43 KB
Loading
-4.18 KB
Loading

0 commit comments

Comments
 (0)