Skip to content

Commit cda4e57

Browse files
committed
zscore responses in tutorials
1 parent 012d954 commit cda4e57

File tree

5 files changed

+113
-79
lines changed

5 files changed

+113
-79
lines changed

tutorials/notebooks/shortclips/05_fit_wordnet_model.ipynb

Lines changed: 52 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,36 @@
104104
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
105105
]
106106
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"Before fitting an encoding model, the fMRI responses are typically z-scored over time. This normalization step is performed for two reasons.\n",
112+
"First, the regularized regression methods used to estimate encoding models generally assume the data to be normalized {cite:t}`Hastie2009`. \n",
113+
"Second, the temporal mean and standard deviation of a voxel are typically considered uninformative in fMRI because they can vary due to factors unrelated to the task, such as differences in signal-to-noise ratio (SNR).\n",
114+
"\n",
115+
"To preserve each run independent from the others, we z-score each run separately."
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"metadata": {},
122+
"outputs": [],
123+
"source": [
124+
"from scipy.stats import zscore\n",
125+
"\n",
126+
"# indice of first sample of each run\n",
127+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
128+
"print(run_onsets)\n",
129+
"\n",
130+
"# zscore each training run separately\n",
131+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
132+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
133+
"# zscore each test run separately\n",
134+
"Y_test = zscore(Y_test, axis=1)"
135+
]
136+
},
107137
{
108138
"cell_type": "markdown",
109139
"metadata": {},
@@ -125,6 +155,9 @@
125155
"outputs": [],
126156
"source": [
127157
"Y_test = Y_test.mean(0)\n",
158+
"# We need to zscore the test data again, because we took the mean across repetitions.\n",
159+
"# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n",
160+
"Y_test = zscore(Y_test, axis=0)\n",
128161
"\n",
129162
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
130163
]
@@ -192,7 +225,8 @@
192225
"following time sample in the validation set. Thus, we define here a\n",
193226
"leave-one-run-out cross-validation split that keeps each recording run\n",
194227
"intact.\n",
195-
"\n"
228+
"\n",
229+
"We define a cross-validation splitter, compatible with ``scikit-learn`` API."
196230
]
197231
},
198232
{
@@ -206,27 +240,6 @@
206240
"from sklearn.model_selection import check_cv\n",
207241
"from voxelwise_tutorials.utils import generate_leave_one_run_out\n",
208242
"\n",
209-
"# indice of first sample of each run\n",
210-
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
211-
"print(run_onsets)"
212-
]
213-
},
214-
{
215-
"cell_type": "markdown",
216-
"metadata": {},
217-
"source": [
218-
"We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n",
219-
"\n"
220-
]
221-
},
222-
{
223-
"cell_type": "code",
224-
"execution_count": null,
225-
"metadata": {
226-
"collapsed": false
227-
},
228-
"outputs": [],
229-
"source": [
230243
"n_samples_train = X_train.shape[0]\n",
231244
"cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n",
232245
"cv = check_cv(cv) # copy the cross-validation splitter into a reusable list"
@@ -240,19 +253,24 @@
240253
"\n",
241254
"Now, let's define the model pipeline.\n",
242255
"\n",
256+
"With regularized linear regression models, it is generally recommended to normalize \n",
257+
"(z-score) both the responses and the features before fitting the model {cite:t}`Hastie2009`. \n",
258+
"Z-scoring corresponds to removing the temporal mean and dividing by the temporal standard deviation.\n",
259+
"We already z-scored the fMRI responses after loading them, so now we need to specify\n",
260+
"in the model how to deal with the features. \n",
261+
"\n",
243262
"We first center the features, since we will not use an intercept. The mean\n",
244263
"value in fMRI recording is non-informative, so each run is detrended and\n",
245264
"demeaned independently, and we do not need to predict an intercept value in\n",
246265
"the linear model.\n",
247266
"\n",
248-
"However, we prefer to avoid normalizing by the standard deviation of each\n",
249-
"feature. If the features are extracted in a consistent way from the stimulus,\n",
267+
"For this particular dataset and example, we do not normalize by the standard deviation \n",
268+
"of each feature. If the features are extracted in a consistent way from the stimulus,\n",
250269
"their relative scale is meaningful. Normalizing them independently from each\n",
251270
"other would remove this information. Moreover, the wordnet features are\n",
252271
"one-hot-encoded, which means that each feature is either present (1) or not\n",
253272
"present (0) in each sample. Normalizing one-hot-encoded features is not\n",
254-
"recommended, since it would scale disproportionately the infrequent features.\n",
255-
"\n"
273+
"recommended, since it would scale disproportionately the infrequent features."
256274
]
257275
},
258276
{
@@ -778,7 +796,7 @@
778796
"cell_type": "markdown",
779797
"metadata": {},
780798
"source": [
781-
"Similarly to [1]_, we correct the coefficients of features linked by a\n",
799+
"Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n",
782800
"semantic relationship. When building the wordnet features, if a frame was\n",
783801
"labeled with `wolf`, the authors automatically added the semantically linked\n",
784802
"categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n",
@@ -954,10 +972,11 @@
954972
"voxel_colors = scale_to_rgb_cube(average_coef_transformed[1:4].T, clip=3).T\n",
955973
"print(\"(n_channels, n_voxels) =\", voxel_colors.shape)\n",
956974
"\n",
957-
"ax = plot_3d_flatmap_from_mapper(voxel_colors[0], voxel_colors[1],\n",
958-
" voxel_colors[2], mapper_file=mapper_file,\n",
959-
" vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0,\n",
960-
" vmax3=1)\n",
975+
"ax = plot_3d_flatmap_from_mapper(\n",
976+
" voxel_colors[0], voxel_colors[1], voxel_colors[2], \n",
977+
" mapper_file=mapper_file, \n",
978+
" vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0, vmax3=1\n",
979+
")\n",
961980
"plt.show()"
962981
]
963982
},
@@ -984,7 +1003,7 @@
9841003
],
9851004
"metadata": {
9861005
"kernelspec": {
987-
"display_name": "Python 3",
1006+
"display_name": "voxelwise_tutorials",
9881007
"language": "python",
9891008
"name": "python3"
9901009
},
@@ -998,7 +1017,7 @@
9981017
"name": "python",
9991018
"nbconvert_exporter": "python",
10001019
"pygments_lexer": "ipython3",
1001-
"version": "3.7.12"
1020+
"version": "3.10.13"
10021021
}
10031022
},
10041023
"nbformat": 4,

tutorials/notebooks/shortclips/06_visualize_hemodynamic_response.ipynb

Lines changed: 18 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,7 @@
6969
"source": [
7070
"## Load the data\n",
7171
"\n",
72-
"We first load the fMRI responses.\n",
73-
"\n"
72+
"We first load and normalize the fMRI responses."
7473
]
7574
},
7675
{
@@ -83,23 +82,32 @@
8382
"source": [
8483
"import os\n",
8584
"import numpy as np\n",
85+
"from scipy.stats import zscore\n",
8686
"from voxelwise_tutorials.io import load_hdf5_array\n",
8787
"\n",
8888
"file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
8989
"Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
9090
"Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
9191
"\n",
9292
"print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
93-
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
93+
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
94+
"\n",
95+
"# indice of first sample of each run\n",
96+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
97+
"\n",
98+
"# zscore each training run separately\n",
99+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
100+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
101+
"# zscore each test run separately\n",
102+
"Y_test = zscore(Y_test, axis=1)"
94103
]
95104
},
96105
{
97106
"cell_type": "markdown",
98107
"metadata": {},
99108
"source": [
100109
"We average the test repeats, to remove the non-repeatable part of fMRI\n",
101-
"responses.\n",
102-
"\n"
110+
"responses, and normalize the average across repeats."
103111
]
104112
},
105113
{
@@ -111,6 +119,7 @@
111119
"outputs": [],
112120
"source": [
113121
"Y_test = Y_test.mean(0)\n",
122+
"Y_test = zscore(Y_test, axis=0)\n",
114123
"\n",
115124
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
116125
]
@@ -169,7 +178,8 @@
169178
"\n",
170179
"We define the same leave-one-run-out cross-validation split as in the\n",
171180
"previous example.\n",
172-
"\n"
181+
"\n",
182+
"We define a cross-validation splitter, compatible with ``scikit-learn`` API."
173183
]
174184
},
175185
{
@@ -183,27 +193,6 @@
183193
"from sklearn.model_selection import check_cv\n",
184194
"from voxelwise_tutorials.utils import generate_leave_one_run_out\n",
185195
"\n",
186-
"# indice of first sample of each run\n",
187-
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
188-
"print(run_onsets)"
189-
]
190-
},
191-
{
192-
"cell_type": "markdown",
193-
"metadata": {},
194-
"source": [
195-
"We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n",
196-
"\n"
197-
]
198-
},
199-
{
200-
"cell_type": "code",
201-
"execution_count": null,
202-
"metadata": {
203-
"collapsed": false
204-
},
205-
"outputs": [],
206-
"source": [
207196
"n_samples_train = X_train.shape[0]\n",
208197
"cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n",
209198
"cv = check_cv(cv) # copy the cross-validation splitter into a reusable list"
@@ -571,7 +560,7 @@
571560
],
572561
"metadata": {
573562
"kernelspec": {
574-
"display_name": "Python 3",
563+
"display_name": "voxelwise_tutorials",
575564
"language": "python",
576565
"name": "python3"
577566
},
@@ -585,7 +574,7 @@
585574
"name": "python",
586575
"nbconvert_exporter": "python",
587576
"pygments_lexer": "ipython3",
588-
"version": "3.7.12"
577+
"version": "3.10.13"
589578
}
590579
},
591580
"nbformat": 4,

tutorials/notebooks/shortclips/08_fit_motion_energy_model.ipynb

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575
"source": [
7676
"## Load the data\n",
7777
"\n",
78-
"We first load the fMRI responses.\n",
78+
"We first load and normalize the fMRI responses.\n",
7979
"\n"
8080
]
8181
},
@@ -89,23 +89,32 @@
8989
"source": [
9090
"import os\n",
9191
"import numpy as np\n",
92+
"from scipy.stats import zscore\n",
9293
"from voxelwise_tutorials.io import load_hdf5_array\n",
9394
"\n",
9495
"file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
9596
"Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
9697
"Y_test = load_hdf5_array(file_name, key=\"Y_test\")\n",
9798
"\n",
9899
"print(\"(n_samples_train, n_voxels) =\", Y_train.shape)\n",
99-
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)"
100+
"print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)\n",
101+
"\n",
102+
"# indice of first sample of each run\n",
103+
"run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
104+
"\n",
105+
"# zscore each training run separately\n",
106+
"Y_train = np.split(Y_train, run_onsets[1:])\n",
107+
"Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
108+
"# zscore each test run separately\n",
109+
"Y_test = zscore(Y_test, axis=1)"
100110
]
101111
},
102112
{
103113
"cell_type": "markdown",
104114
"metadata": {},
105115
"source": [
106116
"We average the test repeats, to remove the non-repeatable part of fMRI\n",
107-
"responses.\n",
108-
"\n"
117+
"responses, and normalize the average across repeats."
109118
]
110119
},
111120
{
@@ -117,6 +126,7 @@
117126
"outputs": [],
118127
"source": [
119128
"Y_test = Y_test.mean(0)\n",
129+
"Y_test = zscore(Y_test, axis=0)\n",
120130
"\n",
121131
"print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
122132
]
@@ -496,7 +506,7 @@
496506
],
497507
"metadata": {
498508
"kernelspec": {
499-
"display_name": "Python 3",
509+
"display_name": "voxelwise_tutorials",
500510
"language": "python",
501511
"name": "python3"
502512
},
@@ -510,7 +520,7 @@
510520
"name": "python",
511521
"nbconvert_exporter": "python",
512522
"pygments_lexer": "ipython3",
513-
"version": "3.7.12"
523+
"version": "3.10.13"
514524
}
515525
},
516526
"nbformat": 4,

0 commit comments

Comments
 (0)