Represent action to the RL model as the pairwise norms matrix of the grid map that would be created after the selected action is taken.
Output:
- Actions outputted by the learning model now consist of the pairwise norms matrix of the grid map that would be created after the selected action is taken
- A method allows the ogm to output actions in the format described in the point above