Update documentation (#323)

cpnota · web-flow · commit ac4c4446b347 · 2024-03-17T17:21:32.000-04:00
* update setup.py

* fix sphinx warnings

* edit handling of docs dependencies

* update docs to latest

* run formatter

* correct getting started doc

* remove script imports

* remove dev installation from .readthedocs.yml

* fix localhost link
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -18,7 +18,7 @@
 # -- Project information -----------------------------------------------------
 
 project = 'autonomous-learning-library'
-copyright = '2020, Chris Nota'
+copyright = '2024, Chris Nota'
 author = 'Chris Nota'
 
 # The full version, including alpha/beta/rc tags
@@ -72,4 +72,4 @@
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+# html_static_path = ['_static']
diff --git a/docs/source/guide/basic_concepts.rst b/docs/source/guide/basic_concepts.rst
@@ -160,11 +160,11 @@ A few other quick things to note: ``f.no_grad(x)`` runs a forward pass with ``to
 ``f.target(x)`` calls the *target network* (an advanced concept used in algorithms such as DQN. For example, David Silver's `course notes <http://www0.cs.ucl.ac.uk/staff/d.silver/web/Talks_files/deep_rl.pdf>`_) associated with the ``Approximation``, also with ``torch.no_grad()``.
 The ``autonomous-learning-library`` provides a few thin wrappers over ``Approximation`` for particular purposes, such as ``QNetwork``, ``VNetwork``, ``FeatureNetwork``, and several ``Policy`` implementations.
 
-Environments
-------------
+ALL Environments
+----------------
 
 The importance of the ``Environment`` in reinforcement learning nearly goes without saying.
-In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `OpenAI Gym <http://gym.openai.com>`_, the defacto standard library for RL environments.
+In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `Gymnasium <https://gymnasium.farama.org>`_ (formerly OpenAI Gym), the defacto standard library for RL environments.
 
 .. figure:: ./ale.png
 
@@ -173,15 +173,15 @@ In the ``autonomous-learning-library``, the prepackaged environments are simply
 
 We add a few additional features:
 
-1) ``gym`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
+1) ``gymnasium`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
 2) We add properties to the environment for ``state``, ``reward``, etc. This simplifies the control loop and is generally useful.
 3) We apply common preprocessors, such as several standard Atari wrappers. However, where possible, we prefer to perform preprocessing using ``Body`` objects to maximize the flexibility of the agents.
 
 Below, we show how several different types of environments can be created:
 
 .. code-block:: python
 
-    from all.environments import AtariEnvironment, GymEnvironment, PybulletEnvironment
+    from all.environments import AtariEnvironment, GymEnvironment, MujocoEnvironment
 
     # create an Atari environment on the gpu
     env = AtariEnvironment('Breakout', device='cuda')
@@ -190,7 +190,7 @@ Below, we show how several different types of environments can be created:
     env = GymEnvironment('CartPole-v0')
 
     # create a PyBullet environment on the cpu
-    env = PybulletEnvironment('cheetah')
+    env = MujocoEnvironment('HalfCheetah-v4')
 
 Now we can write our first control loop:
 
@@ -216,8 +216,8 @@ Of course, this control loop is not exactly feature-packed.
 Generally, it's better to use the ``Experiment`` module described later.
 
 
-Presets
--------
+ALL Presets
+-----------
 
 In the ``autonomous-learning-library``, agents are *compositional*, which means that the behavior of a given ``Agent`` depends on the behavior of several other objects.
 Users can compose agents with specific behavior by passing appropriate objects into the constructor of the high-level algorithms contained in ``all.agents``.
@@ -274,8 +274,8 @@ If a ``Preset`` is loaded from disk, then we can instansiate a test ``Agent`` us
 
 
 
-Experiment
-----------
+ALL Experiments
+---------------
 
 Finally, we have all of the components necessary to introduce the ``run_experiment`` helper function.
 ``run_experiment`` is the built-in control loop for running reinforcement learning experiment.
@@ -284,7 +284,6 @@ Here is a quick example:
 
 .. code-block:: python
 
-    from gym import envs
     from all.experiments import run_experiment
     from all.presets import atari
     from all.environments import AtariEnvironment
@@ -313,7 +312,7 @@ You can view the results in ``tensorboard`` by running the following command:
 
     tensorboard --logdir runs
 
-In addition to the ``tensorboard`` logs, every 100 episodes, the mean and standard deviation of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
+In addition to the ``tensorboard`` logs, every 100 episodes, the mean, standard deviation, min, and max of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
 This is much faster to read and plot than Tensorboard's proprietary format.
 The library contains an automatically plotting utility that generates appropriate plots for an *entire* ``runs`` directory as follows:
 
@@ -324,7 +323,7 @@ The library contains an automatically plotting utility that generates appropriat
 
 This will generate a plot that looks like the following (after tweaking the whitespace through the ``matplotlib`` UI):
 
-.. image:: ../../../benchmarks/atari40.png
+.. image:: ../../../benchmarks/atari_40m.png
 
 An optional parameter is ``test_episodes``, which is set to 100 by default.
 After running for the given number of frames, the agent will be evaluated for a number of episodes specified by ``test_episodes`` with training disabled.
diff --git a/docs/source/guide/benchmark_performance.rst b/docs/source/guide/benchmark_performance.rst
@@ -43,7 +43,7 @@ our agents achieved very similar behavior to the agents tested by DeepMind.
 MuJoCo Benchmark
 ------------------
 
-`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
+`MuJoCo <https://mujoco.org>`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
 The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
 We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
 The learning rate was decayed over the course of training using cosine annealing.
diff --git a/docs/source/guide/getting_started.rst b/docs/source/guide/getting_started.rst
@@ -4,9 +4,9 @@ Getting Started
 Prerequisites
 -------------
 
-The Autonomous Learning Library requires a recent version of PyTorch (~=1.8.0 recommended).
+The Autonomous Learning Library requires a recent version of PyTorch (at least v2.2.0 is recommended).
 Additionally, Tensorboard is required in order to enable logging.
-We also strongly recommend using a machine with a fast GPU (at minimum a GTX 970 or better, a GTX 1080ti or better is preferred).
+We also strongly recommend using a machine with a fast GPU with at least 11 GB of VRAM (a GTX 1080ti or better is preferred).
 
 Installation
 ------------
@@ -35,7 +35,7 @@ An alternate approach, that may be useful when following this tutorial, is to in
     cd autonomous-learning-library
     pip install -e .[dev]
 
-``dev`` will install all of the optional dependencies for developers of the repo, such as unit test and documentation dependencies, as well as all environments.
+``dev`` will install all of the optional dependencies for developers of the repo, such as unit test dependencies, as well as all environments.
 If you chose to clone the repository, you can test your installation by running the unit test suite:
 
 .. code-block:: bash
@@ -50,20 +50,20 @@ Running a Preset Agent
 The goal of the Autonomous Learning Library is to provide components for building new agents.
 However, the library also includes a number of "preset" agent configurations for easy benchmarking and comparison,
 as well as some useful scripts.
-For example, a PPO agent can be run on Cart-Pole as follows:
+For example, an a2c agent can be run on CartPole as follows:
 
 .. code-block:: bash
 
     all-classic CartPole-v0 a2c
 
-The results will be written to ``runs/a2c_<COMMIT>_<DATETIME>``, where ``<COMMIT>`` and ``<DATATIME>`` are strings generated by the library.
+The results will be written to ``runs/a2c_CartPole-v0_<DATETIME>``, ``<DATATIME>`` is generated by the library.
 You can view these results and other information through `tensorboard`:
 
 .. code-block:: bash
 
     tensorboard --logdir runs
 
-By opening your browser to <http://localhost:6006>, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
+By opening your browser to `http://localhost:6006`_, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
 
 .. image:: tensorboard.png
 
@@ -84,9 +84,9 @@ Finally, to watch the trained model in action, we provide a `watch` scripts for
 
 .. code-block:: bash
 
-   all-watch-classic CartPole-v0 runs/a2c_<COMMIT>_<DATETIME>/preset.pt
+   all-watch-classic CartPole-v0 runs/a2c_CartPole-v0_<DATETIME>/preset.pt
 
 You need to find the <id> by checking the ``runs`` directory.
 
 Each of these scripts can be found the ``scripts`` directory of the main repository.
-Be sure to check out the ``atari`` and ``continuous`` scripts for more fun!
+Be sure to check out the ``all-atari`` and ``all-mujoco`` scripts for more fun!
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -26,7 +26,7 @@ Enjoy!
     guide/benchmark_performance
 
 .. toctree::
-    :maxdepth: 4
+    :maxdepth: 1
     :caption: Modules:
 
     modules/agents
diff --git a/setup.py b/setup.py
@@ -26,17 +26,17 @@
         "torch-testing==0.0.2",  # pytorch assertion library
     ],
     "docs": [
-        "sphinx~=3.2.1",  # documentation library
-        "sphinx-autobuild~=2020.9.1",  # documentation live reload
-        "sphinx-rtd-theme~=0.5.0",  # documentation theme
-        "sphinx-automodapi~=0.13.0",  # autogenerate docs for modules
+        "sphinx~=7.2.6",  # documentation library
+        "sphinx-autobuild~=2024.2.4",  # documentation live reload
+        "sphinx-rtd-theme~=2.0.0",  # documentation theme
+        "sphinx-automodapi~=0.17.0",  # autogenerate docs for modules
     ],
 }
 
 extras["all"] = (
     extras["atari"] + extras["mujoco"] + extras["pybullet"] + extras["ma-atari"]
 )
-extras["dev"] = extras["all"] + extras["test"] + extras["docs"]
+extras["dev"] = extras["all"] + extras["test"]
 
 setup(
     name="autonomous-learning-library",