perturbation_stats unclear on mismatched Subkeys  

It is unclear how `perturbation_stats` should handle multiple Subkeys with the same origin (thus the same column name in df).
Currently attempting to group on a duplicated column throws `ValueError: Grouper for 'subsample' not 1-dimensional`.

The illustrative example of this issue comes if we take the exact example pipeline from #35 and attempt to use a single subsample Vset with `output_matching=False` (so the X_trains/X_tests will match properly) instead of the two. Now if we want to predict with uncertainty over subsamples, it is unclear what this means. I think there are 2 cases:

- My initial thought we could implement a way to distinguish identical mismatched Subkeys (maybe by appending `-i`)
- Alternatively/additionally we could try to support multidimensional grouping in `perturbation_stats`

# Illustrative Example

```python
X, y = sklearn.datasets.make_classification(n_samples=100, n_features=5)
X_train, X_test, y_train, y_test = init_args(train_test_split(X, y), names=['xtr', 'xte', 'ytr', 'yte'])

subsampling_funcs = [partial(sklearn.utils.resample, n_samples=80, random_state=i) for i in range(5)]
subsampling_set = Vset(name='subsample', modules=subsampling_funcs)
X_trains, y_trains = subsampling_set(X_train, y_train)
X_tests, y_tests = subsampling_set(X_test, y_test)

models = [LogisticRegression(max_iter=1000, tol=0.1), DecisionTreeClassifier()]
modeling_set = Vset(name='model', modules=models, module_keys=["LR", "DT"])
modeling_set.fit(X_trains, y_trains)

# clamp mean predictions over test-set subsamples
mean_dict, std_dict, pred_stats_df = modeling_set.predict(X_tests, with_uncertainty=True, group_by=['subsample'])
mean_dict = {k: np.round(v) if k != PREV_KEY else v for k, v in mean_dict.items()}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perturbation_stats unclear on mismatched Subkeys #36

Illustrative Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perturbation_stats unclear on mismatched Subkeys #36

Description

Illustrative Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions