|
104 | 104 | "print(\"(n_repeats, n_samples_test, n_voxels) =\", Y_test.shape)" |
105 | 105 | ] |
106 | 106 | }, |
| 107 | + { |
| 108 | + "cell_type": "markdown", |
| 109 | + "metadata": {}, |
| 110 | + "source": [ |
| 111 | + "Before fitting an encoding model, the fMRI responses are typically z-scored over time. This normalization step is performed for two reasons.\n", |
| 112 | + "First, the regularized regression methods used to estimate encoding models generally assume the data to be normalized {cite:t}`Hastie2009`. \n", |
| 113 | + "Second, the temporal mean and standard deviation of a voxel are typically considered uninformative in fMRI because they can vary due to factors unrelated to the task, such as differences in signal-to-noise ratio (SNR).\n", |
| 114 | + "\n", |
| 115 | + "To preserve each run independent from the others, we z-score each run separately." |
| 116 | + ] |
| 117 | + }, |
| 118 | + { |
| 119 | + "cell_type": "code", |
| 120 | + "execution_count": null, |
| 121 | + "metadata": {}, |
| 122 | + "outputs": [], |
| 123 | + "source": [ |
| 124 | + "from scipy.stats import zscore\n", |
| 125 | + "\n", |
| 126 | + "# indice of first sample of each run\n", |
| 127 | + "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
| 128 | + "print(run_onsets)\n", |
| 129 | + "\n", |
| 130 | + "# zscore each training run separately\n", |
| 131 | + "Y_train = np.split(Y_train, run_onsets[1:])\n", |
| 132 | + "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n", |
| 133 | + "# zscore each test run separately\n", |
| 134 | + "Y_test = zscore(Y_test, axis=1)" |
| 135 | + ] |
| 136 | + }, |
107 | 137 | { |
108 | 138 | "cell_type": "markdown", |
109 | 139 | "metadata": {}, |
|
125 | 155 | "outputs": [], |
126 | 156 | "source": [ |
127 | 157 | "Y_test = Y_test.mean(0)\n", |
| 158 | + "# We need to zscore the test data again, because we took the mean across repetitions.\n", |
| 159 | + "# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n", |
| 160 | + "Y_test = zscore(Y_test, axis=0)\n", |
128 | 161 | "\n", |
129 | 162 | "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)" |
130 | 163 | ] |
|
192 | 225 | "following time sample in the validation set. Thus, we define here a\n", |
193 | 226 | "leave-one-run-out cross-validation split that keeps each recording run\n", |
194 | 227 | "intact.\n", |
195 | | - "\n" |
| 228 | + "\n", |
| 229 | + "We define a cross-validation splitter, compatible with ``scikit-learn`` API." |
196 | 230 | ] |
197 | 231 | }, |
198 | 232 | { |
|
206 | 240 | "from sklearn.model_selection import check_cv\n", |
207 | 241 | "from voxelwise_tutorials.utils import generate_leave_one_run_out\n", |
208 | 242 | "\n", |
209 | | - "# indice of first sample of each run\n", |
210 | | - "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n", |
211 | | - "print(run_onsets)" |
212 | | - ] |
213 | | - }, |
214 | | - { |
215 | | - "cell_type": "markdown", |
216 | | - "metadata": {}, |
217 | | - "source": [ |
218 | | - "We define a cross-validation splitter, compatible with ``scikit-learn`` API.\n", |
219 | | - "\n" |
220 | | - ] |
221 | | - }, |
222 | | - { |
223 | | - "cell_type": "code", |
224 | | - "execution_count": null, |
225 | | - "metadata": { |
226 | | - "collapsed": false |
227 | | - }, |
228 | | - "outputs": [], |
229 | | - "source": [ |
230 | 243 | "n_samples_train = X_train.shape[0]\n", |
231 | 244 | "cv = generate_leave_one_run_out(n_samples_train, run_onsets)\n", |
232 | 245 | "cv = check_cv(cv) # copy the cross-validation splitter into a reusable list" |
|
240 | 253 | "\n", |
241 | 254 | "Now, let's define the model pipeline.\n", |
242 | 255 | "\n", |
| 256 | + "With regularized linear regression models, it is generally recommended to normalize \n", |
| 257 | + "(z-score) both the responses and the features before fitting the model {cite:t}`Hastie2009`. \n", |
| 258 | + "Z-scoring corresponds to removing the temporal mean and dividing by the temporal standard deviation.\n", |
| 259 | + "We already z-scored the fMRI responses after loading them, so now we need to specify\n", |
| 260 | + "in the model how to deal with the features. \n", |
| 261 | + "\n", |
243 | 262 | "We first center the features, since we will not use an intercept. The mean\n", |
244 | 263 | "value in fMRI recording is non-informative, so each run is detrended and\n", |
245 | 264 | "demeaned independently, and we do not need to predict an intercept value in\n", |
246 | 265 | "the linear model.\n", |
247 | 266 | "\n", |
248 | | - "However, we prefer to avoid normalizing by the standard deviation of each\n", |
249 | | - "feature. If the features are extracted in a consistent way from the stimulus,\n", |
| 267 | + "For this particular dataset and example, we do not normalize by the standard deviation \n", |
| 268 | + "of each feature. If the features are extracted in a consistent way from the stimulus,\n", |
250 | 269 | "their relative scale is meaningful. Normalizing them independently from each\n", |
251 | 270 | "other would remove this information. Moreover, the wordnet features are\n", |
252 | 271 | "one-hot-encoded, which means that each feature is either present (1) or not\n", |
253 | 272 | "present (0) in each sample. Normalizing one-hot-encoded features is not\n", |
254 | | - "recommended, since it would scale disproportionately the infrequent features.\n", |
255 | | - "\n" |
| 273 | + "recommended, since it would scale disproportionately the infrequent features." |
256 | 274 | ] |
257 | 275 | }, |
258 | 276 | { |
|
778 | 796 | "cell_type": "markdown", |
779 | 797 | "metadata": {}, |
780 | 798 | "source": [ |
781 | | - "Similarly to [1]_, we correct the coefficients of features linked by a\n", |
| 799 | + "Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n", |
782 | 800 | "semantic relationship. When building the wordnet features, if a frame was\n", |
783 | 801 | "labeled with `wolf`, the authors automatically added the semantically linked\n", |
784 | 802 | "categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n", |
|
954 | 972 | "voxel_colors = scale_to_rgb_cube(average_coef_transformed[1:4].T, clip=3).T\n", |
955 | 973 | "print(\"(n_channels, n_voxels) =\", voxel_colors.shape)\n", |
956 | 974 | "\n", |
957 | | - "ax = plot_3d_flatmap_from_mapper(voxel_colors[0], voxel_colors[1],\n", |
958 | | - " voxel_colors[2], mapper_file=mapper_file,\n", |
959 | | - " vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0,\n", |
960 | | - " vmax3=1)\n", |
| 975 | + "ax = plot_3d_flatmap_from_mapper(\n", |
| 976 | + " voxel_colors[0], voxel_colors[1], voxel_colors[2], \n", |
| 977 | + " mapper_file=mapper_file, \n", |
| 978 | + " vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0, vmax3=1\n", |
| 979 | + ")\n", |
961 | 980 | "plt.show()" |
962 | 981 | ] |
963 | 982 | }, |
|
984 | 1003 | ], |
985 | 1004 | "metadata": { |
986 | 1005 | "kernelspec": { |
987 | | - "display_name": "Python 3", |
| 1006 | + "display_name": "voxelwise_tutorials", |
988 | 1007 | "language": "python", |
989 | 1008 | "name": "python3" |
990 | 1009 | }, |
|
998 | 1017 | "name": "python", |
999 | 1018 | "nbconvert_exporter": "python", |
1000 | 1019 | "pygments_lexer": "ipython3", |
1001 | | - "version": "3.7.12" |
| 1020 | + "version": "3.10.13" |
1002 | 1021 | } |
1003 | 1022 | }, |
1004 | 1023 | "nbformat": 4, |
|
0 commit comments