gallantlab
diff --git a/‎tutorials/notebooks/shortclips/05_fit_wordnet_model.ipynb‎
Lines changed: 52 additions & 33 deletions b/‎tutorials/notebooks/shortclips/05_fit_wordnet_model.ipynb‎
Lines changed: 52 additions & 33 deletions
diff --git a/‎tutorials/notebooks/shortclips/06_visualize_hemodynamic_response.ipynb‎
Lines changed: 18 additions & 29 deletions b/‎tutorials/notebooks/shortclips/06_visualize_hemodynamic_response.ipynb‎
Lines changed: 18 additions & 29 deletions
diff --git a/‎tutorials/notebooks/shortclips/08_fit_motion_energy_model.ipynb‎
Lines changed: 16 additions & 6 deletions b/‎tutorials/notebooks/shortclips/08_fit_motion_energy_model.ipynb‎
Lines changed: 16 additions & 6 deletions
@@ -104,6 +104,36 @@
         "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Before fitting an encoding model, the fMRI responses are typically z-scored over time. This normalization step is performed for two reasons.\n",
+        "First, the regularized regression methods used to estimate encoding models generally assume the data to be normalized {cite:t}`Hastie2009`. \n",
+        "Second, the temporal mean and standard deviation of a voxel are typically considered uninformative in fMRI because they can vary due to factors unrelated to the task, such as differences in signal-to-noise ratio (SNR).\n",
+        "\n",
+        "To preserve each run independent from the others, we z-score each run separately."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from scipy.stats import zscore\n",
+        "\n",
+        "# indice of first sample of each run\n",
+        "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
+        "print(run_onsets)\n",
+        "\n",
+        "# zscore each training run separately\n",
+        "Y_train = np.split(Y_train, run_onsets[1:])\n",
+        "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+        "# zscore each test run separately\n",
+        "Y_test = zscore(Y_test, axis=1)"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -125,6 +155,9 @@
       "outputs": [],
       "source": [
         "Y_test = Y_test.mean(0)\n",
+        "# We need to zscore the test data again, because we took the mean across repetitions.\n",
+        "# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n",
+        "Y_test = zscore(Y_test, axis=0)\n",
         "\n",
         "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
       ]
@@ -192,7 +225,8 @@
         "following time sample in the validation set. Thus, we define here a\n",
         "leave-one-run-out cross-validation split that keeps each recording run\n",
         "intact.\n",
-        "\n"
+        "\n",
+        "We define a cross-validation splitter, compatible with ``scikit-learn`` API."
       ]
     },
     {
@@ -206,27 +240,6 @@
         "from sklearn.model_selection import check_cv\n",
         "from voxelwise_tutorials.utils import generate_leave_one_run_out\n",
         "\n",
-        "# indice of first sample of each run\n",
-        "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
-        "print(run_onsets)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
         "n_samples_train = X_train.shape[0]\n",
         "cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n",
         "cv = check_cv(cv)  # copy the cross-validation splitter into a reusable list"
@@ -240,19 +253,24 @@
         "\n",
         "Now, let's define the model pipeline.\n",
         "\n",
+        "With regularized linear regression models, it is generally recommended to normalize \n",
+        "(z-score) both the responses and the features before fitting the model {cite:t}`Hastie2009`. \n",
+        "Z-scoring corresponds to removing the temporal mean and dividing by the temporal standard deviation.\n",
+        "We already z-scored the fMRI responses after loading them, so now we need to specify\n",
+        "in the model how to deal with the features. \n",
+        "\n",
         "We first center the features, since we will not use an intercept. The mean\n",
         "value in fMRI recording is non-informative, so each run is detrended and\n",
         "demeaned independently, and we do not need to predict an intercept value in\n",
         "the linear model.\n",
         "\n",
-        "However, we prefer to avoid normalizing by the standard deviation of each\n",
-        "feature. If the features are extracted in a consistent way from the stimulus,\n",
+        "For this particular dataset and example, we do not normalize by the standard deviation \n",
+        "of each feature. If the features are extracted in a consistent way from the stimulus,\n",
         "their relative scale is meaningful. Normalizing them independently from each\n",
         "other would remove this information. Moreover, the wordnet features are\n",
         "one-hot-encoded, which means that each feature is either present (1) or not\n",
         "present (0) in each sample. Normalizing one-hot-encoded features is not\n",
-        "recommended, since it would scale disproportionately the infrequent features.\n",
-        "\n"
+        "recommended, since it would scale disproportionately the infrequent features."
       ]
     },
     {
@@ -778,7 +796,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Similarly to [1]_, we correct the coefficients of features linked by a\n",
+        "Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n",
         "semantic relationship. When building the wordnet features, if a frame was\n",
         "labeled with `wolf`, the authors automatically added the semantically linked\n",
         "categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n",
@@ -954,10 +972,11 @@
         "voxel_colors = scale_to_rgb_cube(average_coef_transformed[1:4].T, clip=3).T\n",
         "print(\"(n_channels, n_voxels) =\", voxel_colors.shape)\n",
         "\n",
-        "ax = plot_3d_flatmap_from_mapper(voxel_colors[0], voxel_colors[1],\n",
-        "                                 voxel_colors[2], mapper_file=mapper_file,\n",
-        "                                 vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0,\n",
-        "                                 vmax3=1)\n",
+        "ax = plot_3d_flatmap_from_mapper(\n",
+        "    voxel_colors[0], voxel_colors[1], voxel_colors[2], \n",
+        "    mapper_file=mapper_file, \n",
+        "    vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0, vmax3=1\n",
+        ")\n",
         "plt.show()"
       ]
     },
@@ -984,7 +1003,7 @@
   ],
   "metadata": {
     "kernelspec": {
-      "display_name": "Python 3",
+      "display_name": "voxelwise_tutorials",
       "language": "python",
       "name": "python3"
     },
@@ -998,7 +1017,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.10.13"
     }
   },
   "nbformat": 4,
 
@@ -69,8 +69,7 @@
       "source": [
         "## Load the data\n",
         "\n",
-        "We first load the fMRI responses.\n",
-        "\n"
+        "We first load and normalize the fMRI responses."
       ]
     },
     {
@@ -83,23 +82,32 @@
       "source": [
         "import os\n",
         "import numpy as np\n",
+        "from scipy.stats import zscore\n",
         "from voxelwise_tutorials.io import load_hdf5_array\n",
         "\n",
         "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
         "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
         "Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
         "\n",
         "print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
-        "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
+        "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
+        "\n",
+        "# indice of first sample of each run\n",
+        "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
+        "\n",
+        "# zscore each training run separately\n",
+        "Y_train = np.split(Y_train, run_onsets[1:])\n",
+        "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+        "# zscore each test run separately\n",
+        "Y_test = zscore(Y_test, axis=1)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We average the test repeats, to remove the non-repeatable part of fMRI\n",
-        "responses.\n",
-        "\n"
+        "responses, and normalize the average across repeats."
       ]
     },
     {
@@ -111,6 +119,7 @@
       "outputs": [],
       "source": [
         "Y_test = Y_test.mean(0)\n",
+        "Y_test = zscore(Y_test, axis=0)\n",
         "\n",
         "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
       ]
@@ -169,7 +178,8 @@
         "\n",
         "We define the same leave-one-run-out cross-validation split as in the\n",
         "previous example.\n",
-        "\n"
+        "\n",
+        "We define a cross-validation splitter, compatible with ``scikit-learn`` API."
       ]
     },
     {
@@ -183,27 +193,6 @@
         "from sklearn.model_selection import check_cv\n",
         "from voxelwise_tutorials.utils import generate_leave_one_run_out\n",
         "\n",
-        "# indice of first sample of each run\n",
-        "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
-        "print(run_onsets)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
         "n_samples_train = X_train.shape[0]\n",
         "cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n",
         "cv = check_cv(cv)  # copy the cross-validation splitter into a reusable list"
@@ -571,7 +560,7 @@
   ],
   "metadata": {
     "kernelspec": {
-      "display_name": "Python 3",
+      "display_name": "voxelwise_tutorials",
       "language": "python",
       "name": "python3"
     },
@@ -585,7 +574,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.10.13"
     }
   },
   "nbformat": 4,
 
@@ -75,7 +75,7 @@
       "source": [
         "## Load the data\n",
         "\n",
-        "We first load the fMRI responses.\n",
+        "We first load and normalize the fMRI responses.\n",
         "\n"
       ]
     },
@@ -89,23 +89,32 @@
       "source": [
         "import os\n",
         "import numpy as np\n",
+        "from scipy.stats import zscore\n",
         "from voxelwise_tutorials.io import load_hdf5_array\n",
         "\n",
         "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
         "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
         "Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
         "\n",
         "print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
-        "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
+        "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
+        "\n",
+        "# indice of first sample of each run\n",
+        "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
+        "\n",
+        "# zscore each training run separately\n",
+        "Y_train = np.split(Y_train, run_onsets[1:])\n",
+        "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+        "# zscore each test run separately\n",
+        "Y_test = zscore(Y_test, axis=1)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "We average the test repeats, to remove the non-repeatable part of fMRI\n",
-        "responses.\n",
-        "\n"
+        "responses, and normalize the average across repeats."
       ]
     },
     {
@@ -117,6 +126,7 @@
       "outputs": [],
       "source": [
         "Y_test = Y_test.mean(0)\n",
+        "Y_test = zscore(Y_test, axis=0)\n",
         "\n",
         "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
       ]
@@ -496,7 +506,7 @@
   ],
   "metadata": {
     "kernelspec": {
-      "display_name": "Python 3",
+      "display_name": "voxelwise_tutorials",
       "language": "python",
       "name": "python3"
     },
@@ -510,7 +520,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.7.12"
+      "version": "3.10.13"
     }
   },
   "nbformat": 4,