Represent action to the RL model as the pairwise norms matrix of the grid map that would be created after the selected action is taken.
Output: Actions outputted by the learning model now consist of the pairwise norms matrix of the grid map that would be created after the selected action is taken.