I noticed that the code you provided only includes the logits-based distillation method. However, Table 2 in the paper also reports results for attention-based distillation and manifold-based distillation. May I ask whether it would be possible to share the code for these methods as well?
I noticed that the code you provided only includes the logits-based distillation method. However, Table 2 in the paper also reports results for attention-based distillation and manifold-based distillation. May I ask whether it would be possible to share the code for these methods as well?