-
Notifications
You must be signed in to change notification settings - Fork 5
Description
When I try ppo example I am getting this error ( the only change I have made is using only one thread}:
File "/app/src/workspace/install/gym/lib/gym/main.py", line 101, in
main()
File "/app/src/workspace/install/gym/lib/gym/main.py", line 78, in main
scores, steps_array = ep_loop.run(n_games)
File "/app/src/workspace/src/ProtoRL/protorl/loops/ppo_episode.py", line 65, in run
self.agent.update(transitions, batches)
File "/app/src/workspace/src/ProtoRL/protorl/agents/ppo.py", line 22, in update
self.learner.update(transitions, batches)
File "/app/src/workspace/src/ProtoRL/protorl/learner/ppo.py", line 34, in update
adv, returns = calc_adv_and_returns(values, values_, r, d)
File "/app/src/workspace/src/ProtoRL/protorl/utils/common.py", line 84, in calc_adv_and_returns
returns = adv + values.unsqueeze(2)
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
The problem is certainly at this function:
def calc_adv_and_returns(values, values_, rewards, dones, gamma=0.99, gae_lambda=0.95): # TODO - multi gpu support # TODO - support for different vector shapes device = 'cuda:0' if T.cuda.is_available() else 'cpu' deltas = rewards + gamma * values_ - values deltas = deltas.cpu().numpy() dones = dones.cpu().numpy() adv = [0] for step in reversed(range(deltas.shape[0])): advantage = deltas[step] +\ gamma * gae_lambda * adv[-1] * dones[step] adv.append(advantage) adv.reverse() adv = adv[:-1] adv = T.tensor(adv, dtype=T.double, device=device, requires_grad=False).unsqueeze(2) # the problem is with unsqueeze value has only one dimension where adv has two before unsqueeze returns = adv + values.unsqueeze(2) adv = (adv - adv.mean()) / (adv.std() + 1e-4) return adv, returns
And this line:
returns = adv + values.unsqueeze(2)
But I don't have idea how to correct this.