vipulSharma18
diff --git a/‎torchao_float8/README.md‎
Lines changed: 3 additions & 2 deletions b/‎torchao_float8/README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎torchao_float8/figures/float8dq_whole_timeline.png‎
-4.16 KB b/‎torchao_float8/figures/float8dq_whole_timeline.png‎
-4.16 KB
diff --git a/‎torchao_float8/figures/float8wo_whole_timeline.png‎
-8.43 KB b/‎torchao_float8/figures/float8wo_whole_timeline.png‎
-8.43 KB
diff --git a/‎torchao_float8/figures/none_whole_timeline.png‎
-4.18 KB b/‎torchao_float8/figures/none_whole_timeline.png‎
-4.18 KB
@@ -36,17 +36,18 @@ The 0-th inference iteration is profiled with CUDA memory snapshot. The snapshot
   <img src="figures/none_whole_timeline.png" alt="Baseline Whole Timeline" width="800">
   <p><strong>Figure 1:</strong> Baseline Whole Timeline</p>
 </div>
-
 <div align="center">
   <img src="figures/float8dq_whole_timeline.png" alt="FP8 Weights and Activations DQ Whole Timeline" width="800">
   <p><strong>Figure 2:</strong> FP8 Weights and Activations DQ Whole Timeline</p>
 </div>
-
 <div align="center">
   <img src="figures/float8wo_whole_timeline.png" alt="FP8 Weights Only Static Quantization Whole Timeline" width="800">
   <p><strong>Figure 3:</strong> FP8 Weights Only Static Quantization Whole Timeline</p>
 </div>
 
+On first look, comparing the whole timelines in Figures 1, 2, and 3, we can notice that they all have some blocks of memory in the middle of the timeline (encircled). Also, Float8WO has multiple spikes in memory that are not present in the other two snapshots (marked by arrow).
+
+
 ## Torch Execution Trace
 ### Baseline