Skip to content

Commit ac4c444

Browse files
authored
Update documentation (#323)
* update setup.py * fix sphinx warnings * edit handling of docs dependencies * update docs to latest * run formatter * correct getting started doc * remove script imports * remove dev installation from .readthedocs.yml * fix localhost link
1 parent dbb0d96 commit ac4c444

File tree

6 files changed

+29
-30
lines changed

6 files changed

+29
-30
lines changed

docs/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
# -- Project information -----------------------------------------------------
1919

2020
project = 'autonomous-learning-library'
21-
copyright = '2020, Chris Nota'
21+
copyright = '2024, Chris Nota'
2222
author = 'Chris Nota'
2323

2424
# The full version, including alpha/beta/rc tags
@@ -72,4 +72,4 @@
7272
# Add any paths that contain custom static files (such as style sheets) here,
7373
# relative to this directory. They are copied after the builtin static files,
7474
# so a file named "default.css" will overwrite the builtin "default.css".
75-
html_static_path = ['_static']
75+
# html_static_path = ['_static']

docs/source/guide/basic_concepts.rst

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -160,11 +160,11 @@ A few other quick things to note: ``f.no_grad(x)`` runs a forward pass with ``to
160160
``f.target(x)`` calls the *target network* (an advanced concept used in algorithms such as DQN. For example, David Silver's `course notes <http://www0.cs.ucl.ac.uk/staff/d.silver/web/Talks_files/deep_rl.pdf>`_) associated with the ``Approximation``, also with ``torch.no_grad()``.
161161
The ``autonomous-learning-library`` provides a few thin wrappers over ``Approximation`` for particular purposes, such as ``QNetwork``, ``VNetwork``, ``FeatureNetwork``, and several ``Policy`` implementations.
162162

163-
Environments
164-
------------
163+
ALL Environments
164+
----------------
165165

166166
The importance of the ``Environment`` in reinforcement learning nearly goes without saying.
167-
In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `OpenAI Gym <http://gym.openai.com>`_, the defacto standard library for RL environments.
167+
In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `Gymnasium <https://gymnasium.farama.org>`_ (formerly OpenAI Gym), the defacto standard library for RL environments.
168168

169169
.. figure:: ./ale.png
170170

@@ -173,15 +173,15 @@ In the ``autonomous-learning-library``, the prepackaged environments are simply
173173

174174
We add a few additional features:
175175

176-
1) ``gym`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
176+
1) ``gymnasium`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
177177
2) We add properties to the environment for ``state``, ``reward``, etc. This simplifies the control loop and is generally useful.
178178
3) We apply common preprocessors, such as several standard Atari wrappers. However, where possible, we prefer to perform preprocessing using ``Body`` objects to maximize the flexibility of the agents.
179179

180180
Below, we show how several different types of environments can be created:
181181

182182
.. code-block:: python
183183
184-
from all.environments import AtariEnvironment, GymEnvironment, PybulletEnvironment
184+
from all.environments import AtariEnvironment, GymEnvironment, MujocoEnvironment
185185
186186
# create an Atari environment on the gpu
187187
env = AtariEnvironment('Breakout', device='cuda')
@@ -190,7 +190,7 @@ Below, we show how several different types of environments can be created:
190190
env = GymEnvironment('CartPole-v0')
191191
192192
# create a PyBullet environment on the cpu
193-
env = PybulletEnvironment('cheetah')
193+
env = MujocoEnvironment('HalfCheetah-v4')
194194
195195
Now we can write our first control loop:
196196

@@ -216,8 +216,8 @@ Of course, this control loop is not exactly feature-packed.
216216
Generally, it's better to use the ``Experiment`` module described later.
217217

218218

219-
Presets
220-
-------
219+
ALL Presets
220+
-----------
221221

222222
In the ``autonomous-learning-library``, agents are *compositional*, which means that the behavior of a given ``Agent`` depends on the behavior of several other objects.
223223
Users can compose agents with specific behavior by passing appropriate objects into the constructor of the high-level algorithms contained in ``all.agents``.
@@ -274,8 +274,8 @@ If a ``Preset`` is loaded from disk, then we can instansiate a test ``Agent`` us
274274

275275

276276

277-
Experiment
278-
----------
277+
ALL Experiments
278+
---------------
279279

280280
Finally, we have all of the components necessary to introduce the ``run_experiment`` helper function.
281281
``run_experiment`` is the built-in control loop for running reinforcement learning experiment.
@@ -284,7 +284,6 @@ Here is a quick example:
284284

285285
.. code-block:: python
286286
287-
from gym import envs
288287
from all.experiments import run_experiment
289288
from all.presets import atari
290289
from all.environments import AtariEnvironment
@@ -313,7 +312,7 @@ You can view the results in ``tensorboard`` by running the following command:
313312
314313
tensorboard --logdir runs
315314
316-
In addition to the ``tensorboard`` logs, every 100 episodes, the mean and standard deviation of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
315+
In addition to the ``tensorboard`` logs, every 100 episodes, the mean, standard deviation, min, and max of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
317316
This is much faster to read and plot than Tensorboard's proprietary format.
318317
The library contains an automatically plotting utility that generates appropriate plots for an *entire* ``runs`` directory as follows:
319318

@@ -324,7 +323,7 @@ The library contains an automatically plotting utility that generates appropriat
324323
325324
This will generate a plot that looks like the following (after tweaking the whitespace through the ``matplotlib`` UI):
326325

327-
.. image:: ../../../benchmarks/atari40.png
326+
.. image:: ../../../benchmarks/atari_40m.png
328327

329328
An optional parameter is ``test_episodes``, which is set to 100 by default.
330329
After running for the given number of frames, the agent will be evaluated for a number of episodes specified by ``test_episodes`` with training disabled.

docs/source/guide/benchmark_performance.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ our agents achieved very similar behavior to the agents tested by DeepMind.
4343
MuJoCo Benchmark
4444
------------------
4545

46-
`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
46+
`MuJoCo <https://mujoco.org>`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
4747
The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
4848
We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
4949
The learning rate was decayed over the course of training using cosine annealing.

docs/source/guide/getting_started.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Getting Started
44
Prerequisites
55
-------------
66

7-
The Autonomous Learning Library requires a recent version of PyTorch (~=1.8.0 recommended).
7+
The Autonomous Learning Library requires a recent version of PyTorch (at least v2.2.0 is recommended).
88
Additionally, Tensorboard is required in order to enable logging.
9-
We also strongly recommend using a machine with a fast GPU (at minimum a GTX 970 or better, a GTX 1080ti or better is preferred).
9+
We also strongly recommend using a machine with a fast GPU with at least 11 GB of VRAM (a GTX 1080ti or better is preferred).
1010

1111
Installation
1212
------------
@@ -35,7 +35,7 @@ An alternate approach, that may be useful when following this tutorial, is to in
3535
cd autonomous-learning-library
3636
pip install -e .[dev]
3737
38-
``dev`` will install all of the optional dependencies for developers of the repo, such as unit test and documentation dependencies, as well as all environments.
38+
``dev`` will install all of the optional dependencies for developers of the repo, such as unit test dependencies, as well as all environments.
3939
If you chose to clone the repository, you can test your installation by running the unit test suite:
4040

4141
.. code-block:: bash
@@ -50,20 +50,20 @@ Running a Preset Agent
5050
The goal of the Autonomous Learning Library is to provide components for building new agents.
5151
However, the library also includes a number of "preset" agent configurations for easy benchmarking and comparison,
5252
as well as some useful scripts.
53-
For example, a PPO agent can be run on Cart-Pole as follows:
53+
For example, an a2c agent can be run on CartPole as follows:
5454

5555
.. code-block:: bash
5656
5757
all-classic CartPole-v0 a2c
5858
59-
The results will be written to ``runs/a2c_<COMMIT>_<DATETIME>``, where ``<COMMIT>`` and ``<DATATIME>`` are strings generated by the library.
59+
The results will be written to ``runs/a2c_CartPole-v0_<DATETIME>``, ``<DATATIME>`` is generated by the library.
6060
You can view these results and other information through `tensorboard`:
6161

6262
.. code-block:: bash
6363
6464
tensorboard --logdir runs
6565
66-
By opening your browser to <http://localhost:6006>, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
66+
By opening your browser to `http://localhost:6006`_, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
6767

6868
.. image:: tensorboard.png
6969

@@ -84,9 +84,9 @@ Finally, to watch the trained model in action, we provide a `watch` scripts for
8484

8585
.. code-block:: bash
8686
87-
all-watch-classic CartPole-v0 runs/a2c_<COMMIT>_<DATETIME>/preset.pt
87+
all-watch-classic CartPole-v0 runs/a2c_CartPole-v0_<DATETIME>/preset.pt
8888
8989
You need to find the <id> by checking the ``runs`` directory.
9090

9191
Each of these scripts can be found the ``scripts`` directory of the main repository.
92-
Be sure to check out the ``atari`` and ``continuous`` scripts for more fun!
92+
Be sure to check out the ``all-atari`` and ``all-mujoco`` scripts for more fun!

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Enjoy!
2626
guide/benchmark_performance
2727

2828
.. toctree::
29-
:maxdepth: 4
29+
:maxdepth: 1
3030
:caption: Modules:
3131

3232
modules/agents

setup.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,17 +26,17 @@
2626
"torch-testing==0.0.2", # pytorch assertion library
2727
],
2828
"docs": [
29-
"sphinx~=3.2.1", # documentation library
30-
"sphinx-autobuild~=2020.9.1", # documentation live reload
31-
"sphinx-rtd-theme~=0.5.0", # documentation theme
32-
"sphinx-automodapi~=0.13.0", # autogenerate docs for modules
29+
"sphinx~=7.2.6", # documentation library
30+
"sphinx-autobuild~=2024.2.4", # documentation live reload
31+
"sphinx-rtd-theme~=2.0.0", # documentation theme
32+
"sphinx-automodapi~=0.17.0", # autogenerate docs for modules
3333
],
3434
}
3535

3636
extras["all"] = (
3737
extras["atari"] + extras["mujoco"] + extras["pybullet"] + extras["ma-atari"]
3838
)
39-
extras["dev"] = extras["all"] + extras["test"] + extras["docs"]
39+
extras["dev"] = extras["all"] + extras["test"]
4040

4141
setup(
4242
name="autonomous-learning-library",

0 commit comments

Comments
 (0)