You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/guide/basic_concepts.rst
+12-13Lines changed: 12 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -160,11 +160,11 @@ A few other quick things to note: ``f.no_grad(x)`` runs a forward pass with ``to
160
160
``f.target(x)`` calls the *target network* (an advanced concept used in algorithms such as DQN. For example, David Silver's `course notes <http://www0.cs.ucl.ac.uk/staff/d.silver/web/Talks_files/deep_rl.pdf>`_) associated with the ``Approximation``, also with ``torch.no_grad()``.
161
161
The ``autonomous-learning-library`` provides a few thin wrappers over ``Approximation`` for particular purposes, such as ``QNetwork``, ``VNetwork``, ``FeatureNetwork``, and several ``Policy`` implementations.
162
162
163
-
Environments
164
-
------------
163
+
ALL Environments
164
+
----------------
165
165
166
166
The importance of the ``Environment`` in reinforcement learning nearly goes without saying.
167
-
In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `OpenAI Gym <http://gym.openai.com>`_, the defacto standard library for RL environments.
167
+
In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `Gymnasium <https://gymnasium.farama.org>`_ (formerly OpenAI Gym), the defacto standard library for RL environments.
168
168
169
169
.. figure:: ./ale.png
170
170
@@ -173,15 +173,15 @@ In the ``autonomous-learning-library``, the prepackaged environments are simply
173
173
174
174
We add a few additional features:
175
175
176
-
1) ``gym`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
176
+
1) ``gymnasium`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
177
177
2) We add properties to the environment for ``state``, ``reward``, etc. This simplifies the control loop and is generally useful.
178
178
3) We apply common preprocessors, such as several standard Atari wrappers. However, where possible, we prefer to perform preprocessing using ``Body`` objects to maximize the flexibility of the agents.
179
179
180
180
Below, we show how several different types of environments can be created:
@@ -190,7 +190,7 @@ Below, we show how several different types of environments can be created:
190
190
env = GymEnvironment('CartPole-v0')
191
191
192
192
# create a PyBullet environment on the cpu
193
-
env =PybulletEnvironment('cheetah')
193
+
env =MujocoEnvironment('HalfCheetah-v4')
194
194
195
195
Now we can write our first control loop:
196
196
@@ -216,8 +216,8 @@ Of course, this control loop is not exactly feature-packed.
216
216
Generally, it's better to use the ``Experiment`` module described later.
217
217
218
218
219
-
Presets
220
-
-------
219
+
ALL Presets
220
+
-----------
221
221
222
222
In the ``autonomous-learning-library``, agents are *compositional*, which means that the behavior of a given ``Agent`` depends on the behavior of several other objects.
223
223
Users can compose agents with specific behavior by passing appropriate objects into the constructor of the high-level algorithms contained in ``all.agents``.
@@ -274,8 +274,8 @@ If a ``Preset`` is loaded from disk, then we can instansiate a test ``Agent`` us
274
274
275
275
276
276
277
-
Experiment
278
-
----------
277
+
ALL Experiments
278
+
---------------
279
279
280
280
Finally, we have all of the components necessary to introduce the ``run_experiment`` helper function.
281
281
``run_experiment`` is the built-in control loop for running reinforcement learning experiment.
@@ -284,7 +284,6 @@ Here is a quick example:
284
284
285
285
.. code-block:: python
286
286
287
-
from gym import envs
288
287
fromall.experiments import run_experiment
289
288
fromall.presets import atari
290
289
fromall.environments import AtariEnvironment
@@ -313,7 +312,7 @@ You can view the results in ``tensorboard`` by running the following command:
313
312
314
313
tensorboard --logdir runs
315
314
316
-
In addition to the ``tensorboard`` logs, every 100 episodes, the mean and standard deviation of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
315
+
In addition to the ``tensorboard`` logs, every 100 episodes, the mean, standard deviation, min, and max of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
317
316
This is much faster to read and plot than Tensorboard's proprietary format.
318
317
The library contains an automatically plotting utility that generates appropriate plots for an *entire* ``runs`` directory as follows:
319
318
@@ -324,7 +323,7 @@ The library contains an automatically plotting utility that generates appropriat
324
323
325
324
This will generate a plot that looks like the following (after tweaking the whitespace through the ``matplotlib`` UI):
326
325
327
-
.. image:: ../../../benchmarks/atari40.png
326
+
.. image:: ../../../benchmarks/atari_40m.png
328
327
329
328
An optional parameter is ``test_episodes``, which is set to 100 by default.
330
329
After running for the given number of frames, the agent will be evaluated for a number of episodes specified by ``test_episodes`` with training disabled.
Copy file name to clipboardExpand all lines: docs/source/guide/benchmark_performance.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ our agents achieved very similar behavior to the agents tested by DeepMind.
43
43
MuJoCo Benchmark
44
44
------------------
45
45
46
-
`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
46
+
`MuJoCo <https://mujoco.org>`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
47
47
The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
48
48
We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
49
49
The learning rate was decayed over the course of training using cosine annealing.
Copy file name to clipboardExpand all lines: docs/source/guide/getting_started.rst
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ Getting Started
4
4
Prerequisites
5
5
-------------
6
6
7
-
The Autonomous Learning Library requires a recent version of PyTorch (~=1.8.0 recommended).
7
+
The Autonomous Learning Library requires a recent version of PyTorch (at least v2.2.0 is recommended).
8
8
Additionally, Tensorboard is required in order to enable logging.
9
-
We also strongly recommend using a machine with a fast GPU (at minimum a GTX 970 or better, a GTX 1080ti or better is preferred).
9
+
We also strongly recommend using a machine with a fast GPU with at least 11 GB of VRAM (a GTX 1080ti or better is preferred).
10
10
11
11
Installation
12
12
------------
@@ -35,7 +35,7 @@ An alternate approach, that may be useful when following this tutorial, is to in
35
35
cd autonomous-learning-library
36
36
pip install -e .[dev]
37
37
38
-
``dev`` will install all of the optional dependencies for developers of the repo, such as unit test and documentation dependencies, as well as all environments.
38
+
``dev`` will install all of the optional dependencies for developers of the repo, such as unit test dependencies, as well as all environments.
39
39
If you chose to clone the repository, you can test your installation by running the unit test suite:
40
40
41
41
.. code-block:: bash
@@ -50,20 +50,20 @@ Running a Preset Agent
50
50
The goal of the Autonomous Learning Library is to provide components for building new agents.
51
51
However, the library also includes a number of "preset" agent configurations for easy benchmarking and comparison,
52
52
as well as some useful scripts.
53
-
For example, a PPO agent can be run on Cart-Pole as follows:
53
+
For example, an a2c agent can be run on CartPole as follows:
54
54
55
55
.. code-block:: bash
56
56
57
57
all-classic CartPole-v0 a2c
58
58
59
-
The results will be written to ``runs/a2c_<COMMIT>_<DATETIME>``, where ``<COMMIT>`` and ``<DATATIME>`` are strings generated by the library.
59
+
The results will be written to ``runs/a2c_CartPole-v0_<DATETIME>``, ``<DATATIME>`` is generated by the library.
60
60
You can view these results and other information through `tensorboard`:
61
61
62
62
.. code-block:: bash
63
63
64
64
tensorboard --logdir runs
65
65
66
-
By opening your browser to <http://localhost:6006>, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
66
+
By opening your browser to `http://localhost:6006`_, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
67
67
68
68
.. image:: tensorboard.png
69
69
@@ -84,9 +84,9 @@ Finally, to watch the trained model in action, we provide a `watch` scripts for
0 commit comments