The HuggingFaceNmtEngine class currently implements the ATT-OUTPUT approach from this paper. The ATT-INPUT method would generate better quality alignments. In order to implement ATT-INPUT, the class would need to shift the attentions to the left one step. This can be done by not adding a 0 matrix at the beginning of the attentions. We would also need to change what layer the attentions are retrieved from (bottom layers). For ATT-INPUT, it is possible for the last token to not get aligned if the translation has hit the max generation length. This edge case should be handled properly.
The
HuggingFaceNmtEngineclass currently implements the ATT-OUTPUT approach from this paper. The ATT-INPUT method would generate better quality alignments. In order to implement ATT-INPUT, the class would need to shift the attentions to the left one step. This can be done by not adding a 0 matrix at the beginning of the attentions. We would also need to change what layer the attentions are retrieved from (bottom layers). For ATT-INPUT, it is possible for the last token to not get aligned if the translation has hit the max generation length. This edge case should be handled properly.