Problem with PPO discrete example

When I try ppo example I am getting this error ( the only change I have made is using only one thread}:

File "/app/src/workspace/install/gym/lib/gym/main.py", line 101, in <module>
     main()
  File "/app/src/workspace/install/gym/lib/gym/main.py", line 78, in main
    scores, steps_array = ep_loop.run(n_games)
   File "/app/src/workspace/src/ProtoRL/protorl/loops/ppo_episode.py", line 65, in run
    self.agent.update(transitions, batches)
   File "/app/src/workspace/src/ProtoRL/protorl/agents/ppo.py", line 22, in update
    self.learner.update(transitions, batches)
   File "/app/src/workspace/src/ProtoRL/protorl/learner/ppo.py", line 34, in update
     adv, returns = calc_adv_and_returns(values, values_, r, d)
   File "/app/src/workspace/src/ProtoRL/protorl/utils/common.py", line 84, in calc_adv_and_returns
    returns = adv + values.unsqueeze(2)
 IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

The problem is certainly at this function:

`def calc_adv_and_returns(values, values_, rewards, dones,
                         gamma=0.99, gae_lambda=0.95):
    # TODO - multi gpu support
    # TODO - support for different vector shapes
    device = 'cuda:0' if T.cuda.is_available() else 'cpu'
    deltas = rewards + gamma * values_ - values
    deltas = deltas.cpu().numpy()
    dones = dones.cpu().numpy()
    adv = [0]
    for step in reversed(range(deltas.shape[0])):
        advantage = deltas[step] +\
            gamma * gae_lambda * adv[-1] * dones[step]
        adv.append(advantage)
    adv.reverse()
    adv = adv[:-1]
    adv = T.tensor(adv, dtype=T.double,
                   device=device, requires_grad=False).unsqueeze(2)
    # the problem is with unsqueeze value has only one dimension where adv has two before unsqueeze
    returns = adv + values.unsqueeze(2)
    adv = (adv - adv.mean()) / (adv.std() + 1e-4)
    return adv, returns`

And this line:

`returns = adv + values.unsqueeze(2)`

But I don't have idea how to correct this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem with PPO discrete example #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem with PPO discrete example #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions