|
20 | 20 | "cell_type": "markdown", |
21 | 21 | "metadata": {}, |
22 | 22 | "source": [ |
23 | | - "### Setup\n", |
| 23 | + "### Colab Setup\n", |
24 | 24 | "Run the following cell to install the code and dependencies from github." |
25 | 25 | ] |
26 | 26 | }, |
|
37 | 37 | "cell_type": "markdown", |
38 | 38 | "metadata": {}, |
39 | 39 | "source": [ |
40 | | - "### Task 1: look at the data\n", |
| 40 | + "### Task 1 -- Part (a): look at the data\n", |
41 | 41 | "In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.\n", |
42 | 42 | "\n", |
43 | 43 | "- Call this function, which returns a single object, and assign it to the variable ``data``.\n", |
|
296 | 296 | " return feats, tgt" |
297 | 297 | ] |
298 | 298 | }, |
299 | | - { |
300 | | - "cell_type": "code", |
301 | | - "execution_count": null, |
302 | | - "metadata": {}, |
303 | | - "outputs": [], |
304 | | - "source": [ |
305 | | - "from typing import List, Tuple, Any\n", |
306 | | - "\n", |
307 | | - "# import some useful functions here, see https://pytorch.org/docs/stable/torch.html\n", |
308 | | - "# where `tensor` and `eye` are used for constructing tensors,\n", |
309 | | - "# and using a lower-precision float32 is advised for performance\n", |
310 | | - "# Task 4: add imports here\n", |
311 | | - "# from torch import tensor, eye, float32\n", |
312 | | - "\n", |
313 | | - "from torch.utils.data import Dataset\n", |
314 | | - "\n", |
315 | | - "from palmerpenguins import load_penguins\n", |
316 | | - "\n", |
317 | | - "\n", |
318 | | - "class PenguinDataset(Dataset):\n", |
319 | | - " \"\"\"Penguin dataset class.\n", |
320 | | - "\n", |
321 | | - " Parameters\n", |
322 | | - " ----------\n", |
323 | | - " input_keys : List[str]\n", |
324 | | - " The column titles to use in the input feature vectors.\n", |
325 | | - " target_keys : List[str]\n", |
326 | | - " The column titles to use in the target feature vectors.\n", |
327 | | - " train : bool\n", |
328 | | - " If ``True``, this object will serve as the training set, and if\n", |
329 | | - " ``False``, the validation set.\n", |
330 | | - "\n", |
331 | | - " Notes\n", |
332 | | - " -----\n", |
333 | | - " The validation split contains 10 male and 10 female penguins of each\n", |
334 | | - " species.\n", |
335 | | - "\n", |
336 | | - " \"\"\"\n", |
337 | | - "\n", |
338 | | - " def __init__(\n", |
339 | | - " self,\n", |
340 | | - " input_keys: List[str],\n", |
341 | | - " target_keys: List[str],\n", |
342 | | - " train: bool,\n", |
343 | | - " ):\n", |
344 | | - " \"\"\"Build ``PenguinDataset``.\"\"\"\n", |
345 | | - " self.input_keys = input_keys\n", |
346 | | - " self.target_keys = target_keys\n", |
347 | | - "\n", |
348 | | - " data = load_penguins()\n", |
349 | | - " data = (\n", |
350 | | - " data.loc[~data.isna().any(axis=1)]\n", |
351 | | - " .sort_values(by=sorted(data.keys()))\n", |
352 | | - " .reset_index(drop=True)\n", |
353 | | - " )\n", |
354 | | - " # Transform the sex field into a float, with male represented by 1.0, female by 0.0\n", |
355 | | - " data.sex = (data.sex == \"male\").astype(float)\n", |
356 | | - " self.full_df = data\n", |
357 | | - "\n", |
358 | | - " valid_df = self.full_df.groupby(by=[\"species\", \"sex\"]).sample(\n", |
359 | | - " n=10,\n", |
360 | | - " random_state=123,\n", |
361 | | - " )\n", |
362 | | - " # The training items are simply the items *not* in the valid split\n", |
363 | | - " train_df = self.full_df.loc[~self.full_df.index.isin(valid_df.index)]\n", |
364 | | - "\n", |
365 | | - " self.split = {\"train\": train_df, \"valid\": valid_df}[\n", |
366 | | - " \"train\" if train is True else \"valid\"\n", |
367 | | - " ]\n", |
368 | | - "\n", |
369 | | - " def __len__(self) -> int:\n", |
370 | | - " \"\"\"Return the length of requested split.\n", |
371 | | - "\n", |
372 | | - " Returns\n", |
373 | | - " -------\n", |
374 | | - " int\n", |
375 | | - " The number of items in the dataset.\n", |
376 | | - "\n", |
377 | | - " \"\"\"\n", |
378 | | - " return len(self.split)\n", |
379 | | - "\n", |
380 | | - " def __getitem__(self, idx: int) -> Tuple[Any, Any]:\n", |
381 | | - " \"\"\"Return an input-target pair.\n", |
382 | | - "\n", |
383 | | - " Parameters\n", |
384 | | - " ----------\n", |
385 | | - " idx : int\n", |
386 | | - " Index of the input-target pair to return.\n", |
387 | | - "\n", |
388 | | - " Returns\n", |
389 | | - " -------\n", |
390 | | - " in_feats : Any\n", |
391 | | - " Inputs.\n", |
392 | | - " target : Any\n", |
393 | | - " Targets.\n", |
394 | | - "\n", |
395 | | - " \"\"\"\n", |
396 | | - " # get the row index (idx) from the dataframe and\n", |
397 | | - " # select relevant column features (provided as input_keys)\n", |
398 | | - " feats = tuple(self.split.iloc[idx][self.input_keys])\n", |
399 | | - "\n", |
400 | | - " # this gives a 'species' i.e. one of ('Gentoo',), ('Chinstrap',), or ('Adelie',)\n", |
401 | | - " tgts = tuple(self.split.iloc[idx][self.target_keys])\n", |
402 | | - "\n", |
403 | | - " # Task 4 - Exercise #1: convert the features to PyTorch Tensors\n", |
404 | | - "\n", |
405 | | - " # Task 4 - Exercise #2: convert target to a 'one-hot' vector.\n", |
406 | | - "\n", |
407 | | - " return feats, tgts" |
408 | | - ] |
409 | | - }, |
410 | 299 | { |
411 | 300 | "cell_type": "markdown", |
412 | 301 | "metadata": {}, |
|
518 | 407 | "Instantiate the `torchvision.transforms.Compose` transformations and pass to the `PenguinsDataset` in [src/ml_workshop/_penguins.py](../src/ml_workshop/_penguins.py), instead of hardcoding as above. " |
519 | 408 | ] |
520 | 409 | }, |
521 | | - { |
522 | | - "cell_type": "code", |
523 | | - "execution_count": 1, |
524 | | - "metadata": {}, |
525 | | - "outputs": [], |
526 | | - "source": [ |
527 | | - "# Apply transforms to the data. See Task 4 exercise comments above.\n", |
528 | | - "\n", |
529 | | - "# Create train_set\n", |
530 | | - "\n", |
531 | | - "# Create valid_set\n" |
532 | | - ] |
533 | | - }, |
534 | | - { |
535 | | - "cell_type": "markdown", |
536 | | - "metadata": {}, |
537 | | - "source": [ |
538 | | - "### (Optional) Task 4b: \n", |
539 | | - "\n", |
540 | | - "Apply the `torchvision.transforms.Compose` transformations instead of hardcoding as above. " |
541 | | - ] |
542 | | - }, |
543 | 410 | { |
544 | 411 | "cell_type": "code", |
545 | 412 | "execution_count": null, |
|
0 commit comments