Skip to content

iterative_train_test_split assumes dataframe when numpy array is allowable #77

@necrosource-bot

Description

@necrosource-bot

Mirrored from scikit-multilearn/scikit-multilearn#307

The issue here is that iterative_train_test_split is defined as

def iterative_train_test_split(
    X: np.ndarray,
    y: np.ndarray,
    ...
)

but somehow expects X and y to have a loc(...) method, which they do not. This results in the following exception:

Traceback (most recent call last):
  File "mlc_bugreport.py", line 10, in <module>
    X_train, X_test, y_train, y_test = iterative_train_test_split(X, y, test_size = 0.1, random_state=42)
  File ".../iterative_stratification.py", line 110, in iterative_train_test_split
    X_train, y_train = X.loc[train_indexes], y.loc[train_indexes]
AttributeError: 'numpy.ndarray' object has no attribute 'loc'

As already said in scikit-multilearn#286 , removing the access method seems to be enough to fix the issue.

Reproduction:
scikit-multilearn-ng v0.0.6 installed

from sklearn.datasets import load_iris
from sklearn.preprocessing import LabelBinarizer
from skmultilearn.model_selection import iterative_train_test_split
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.svm import SVC

dataset = load_iris()
X = dataset.data
y = LabelBinarizer().fit(dataset.target).transform(dataset.target)
X_train, X_test, y_train, y_test = iterative_train_test_split(X, y, test_size = 0.1, random_state=42)
classifier = BinaryRelevance(
    classifier = SVC(),
    require_dense = [False, True]
)
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions