When I evaluate InternLM2-Math-Plus-7b in minif2f through this code, it fails. The model only generates one line "Here is the predicted next tactic:" without any tactics. If I let the model continue generating until I get a tactic each time, I only get a pass rate 38.1% instead of 43.4%.