Standardization of test data in Lab 6 should use training mean and standard deviation

## Observed behavior

Hi, there are bugs in [classification-and-pca-lab.ipynb](https://github.com/cs109/a-2017/blob/master/Labs/Lab6_Classification_PCA/classification-and-pca-lab.ipynb) for `Lab 6 ` in the `do_classify` and `classify_from_dataframe` methods. When standardizing the testing data, its mean and standard deviation are used. This is incorrect for several reasons such as:
- No information from the testing data should be used in the model prediction as it is a form of *data snooping*. The testing dataset has been contaminated by this.
- The same variable is not being created during the transformation of the training and testing sets 

## Expected behavior

The training data mean and standard deviation should be used for standardizing the testing data like so:

```python
dftest=(subdf.iloc[itest] - subdf.iloc[itrain].mean())/subdf.iloc[itrain].std()
```

```python
Xte = (subdf.iloc[itest] - subdf.iloc[itrain].mean())/subdf.iloc[itrain].std()
```

I think this was mentioned in one of the earlier lectures and here are some more references:
 - https://stats.stackexchange.com/questions/202287/why-standardization-of-the-testing-set-has-to-be-performed-with-the-mean-and-sd
- https://sebastianraschka.com/faq/docs/scale-training-test.html
- https://www.researchgate.net/post/If_I_used_data_normalization_x-meanx_stdx_for_training_data_would_I_use_train_Mean_and_Standard_Deviation_to_normalize_test_data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Standardization of test data in Lab 6 should use training mean and standard deviation #11

Observed behavior

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Standardization of test data in Lab 6 should use training mean and standard deviation #11

Description

Observed behavior

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions