Create norm-based action input to RL model

Represent action to the RL model as the pairwise norms matrix of the grid map that would be created after the selected action is taken.

**Output:** Actions outputted by the learning model now consist of the pairwise norms matrix of the grid map that would be created after the selected action is taken.