Small updates for NCAS summer school improving background info (#24)

dorchard · web-flow · commit 8566ac5673a3 · 2023-12-22T10:25:33.000Z
* some comments on notebooks to help students

* make worked examples consistent with the additional commentary added in the exercises

* typo fix for consistency
diff --git a/exercises/01_penguin_classification.ipynb b/exercises/01_penguin_classification.ipynb
@@ -105,7 +105,9 @@
     "    train=True,\n",
     ")\n",
     "\n",
+    "\n",
     "for features, target in data_set:\n",
+    "    # print the features and targets here\n",
     "    pass"
    ]
   },
@@ -124,7 +126,7 @@
    "source": [
     "### Task 4: Applying transforms to the data\n",
     "\n",
-    "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.\n",
+    "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The [``Compose``](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html) object takes a list of callable objects (i.e., functions) and applies them to the incoming data.\n",
     "\n",
     "These transforms can be very useful for mapping between file paths and tensors of images, etc.\n",
     "\n",
@@ -141,8 +143,12 @@
    "outputs": [],
    "source": [
     "from torchvision.transforms import Compose\n",
+    "# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n",
+    "# where `tensor` and `eye` are used for constructing tensors,\n",
+    "# and using a lower-precision float32 is advised for performance\n",
+    "from torch import tensor, eye, float32 \n",
     "\n",
-    "# Apply the transforms we need to the PenguinDataset to get out inputs\n",
+    "# Apply the transforms we need to the PenguinDataset to get out input\n",
     "# targets as Tensors."
    ]
   },
@@ -154,7 +160,7 @@
     "\n",
     "- Once we have created a ``Dataset`` object, we wrap it in a ``DataLoader``.\n",
     "  - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n",
-    "    - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once.\n",
+    "    - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n",
     "    - The number of items we supply at once is called the batch size.\n",
     "  - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n",
     "  - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n",
diff --git a/src/ml_workshop/_penguins.py b/src/ml_workshop/_penguins.py
@@ -109,6 +109,7 @@ def _load_penguin_data() -> DataFrame:
         .sort_values(by=sorted(data.keys()))
         .reset_index(drop=True)
     )
+    # Transform the sex field into a float, with male represented by 1.0, female by 0.0
     data.sex = (data.sex == "male").astype(float)
     return data
 
diff --git a/worked-solutions/01_penguin_classification_solutions.ipynb b/worked-solutions/01_penguin_classification_solutions.ipynb
@@ -214,7 +214,7 @@
    "source": [
     "### Task 4: Applying transforms to the data\n",
     "\n",
-    "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.\n",
+    "A common way of transforming inputs to neural networks is to apply a series of transforms using ``torchvision.transforms.Compose``. The [``Compose``](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html) object takes a list of callable objects and applies them to the incoming data.\n",
     "\n",
     "These transforms can be very useful for mapping between file paths and tensors of images, etc.\n",
     "\n",
@@ -242,11 +242,14 @@
     }
    ],
    "source": [
-    "from torch import tensor, float32, eye\n",
     "from torchvision.transforms import Compose\n",
+    "# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n",
+    "# where `tensor` and `eye` are used for constructing tensors,\n",
+    "# and using a lower-precision float32 is advised for performance\n",
+    "from torch import tensor, float32, eye\n",
     "\n",
     "\n",
-    "# Apply the transforms we need to the PenguinDataset to get out inputs\n",
+    "# Apply the transforms we need to the PenguinDataset to get out input\n",
     "# targets as Tensors.\n",
     "\n",
     "\n",
@@ -321,7 +324,7 @@
     "\n",
     "- Once we have created a ``Dataset`` object, we wrap it in a ``DataLoader``.\n",
     "  - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.\n",
-    "    - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once.\n",
+    "    - Note: rather than supplying one input-target pair to the model at a time, we supply \"mini-batches\" of these data at once (typically a small power of 2, like 16 or 32).\n",
     "    - The number of items we supply at once is called the batch size.\n",
     "  - The ``DataLoader`` can also randomly shuffle the data each epoch (when training).\n",
     "  - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.\n",

Original file line number	Diff line number	Diff line change
`@@ -109,6 +109,7 @@ def _load_penguin_data() -> DataFrame:`
`109`	`109`	`.sort_values(by=sorted(data.keys()))`
`110`	`110`	`.reset_index(drop=True)`
`111`	`111`	`)`
	`112`	`+ # Transform the sex field into a float, with male represented by 1.0, female by 0.0`
`112`	`113`	`data.sex = (data.sex == "male").astype(float)`
`113`	`114`	`return data`
`114`	`115`