After needing to use scikit-multilearn and detecting errors, I opened a PR and waited. But after double checking I saw that there hadn’t been any commits in 7 months (now 9 months) and that it had not been a release since 2018, I dug in and found out that no one had access to the PyPi credentials and so on. So I opened a discussion about creating a fork and many were eager for it.
So after some developing, I’m here to introduce scikit-multilearn-ng (GitHub: https://github.com/scikit-multilearn-ng/scikit-multilearn-ng), an advanced, open-source tool for multi-label classification in Python. It’s a direct successor to scikit-multilearn and brings a host of improvements and new features.
What Makes scikit-multilearn-ng Stand Out?
- Enhanced Integration with scikit-learn: This package not only integrates with the scikit-learn ecosystem but also extends its capabilities, making it a natural fit for those familiar with scikit-learn.
- Expanded Algorithm Collection: Among its new offerings are StructuredGridSearchCV and the SMiLE algorithm, specifically designed for more complex multi-label classification tasks, including handling missing labels and heterogeneous features.
- Open Source Philosophy: As a community-driven project, it’s free to use and open for contributions, perfect for collaborative development.
Why Should You Consider Upgrading?
- Ease of Transition: For those already using scikit-multilearn, upgrading is as simple as switching the dependency to scikit-multilearn-ng. Your existing code will work without any changes.
- Active Development and Support: scikit-multilearn-ng offers bug fixes and new features, ensuring your projects stay current and robust.
Whether you’re a seasoned Python developer or just starting out in machine learning, scikit-multilearn-ng is worth exploring.
Some Example Use Cases:
A simple example use case is iterative splitting multilabel data between train and test data while trying to maintain the distribution of each label between the training and test sets. This is particularly useful for datasets where certain label combinations are rare.
<span>from</span> <span>skmultilearn.model_selection</span> <span>import</span> <span>iterative_train_test_split</span><span>import</span> <span>numpy</span> <span>as</span> <span>np</span><span># Assuming X is your feature matrix and y is your label matrix # X should be a numpy array or a sparse matrix # y should be a binary indicator matrix (each label is either 0 or 1) </span><span># Define the size of your test set </span><span>test_size</span> <span>=</span> <span>0.2</span><span># Perform the split # The function returns flattened arrays, so you need to reshape them </span><span>X_train</span><span>,</span> <span>y_train</span><span>,</span> <span>X_test</span><span>,</span> <span>y_test</span> <span>=</span> <span>iterative_train_test_split</span><span>(</span><span>X</span><span>,</span> <span>y</span><span>,</span> <span>test_size</span> <span>=</span> <span>test_size</span><span>)</span><span># Reshape the outputs back to the original shapes </span><span>num_labels</span> <span>=</span> <span>y</span><span>.</span><span>shape</span><span>[</span><span>1</span><span>]</span><span>y_train</span> <span>=</span> <span>y_train</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span><span>y_test</span> <span>=</span> <span>y_test</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span><span>from</span> <span>skmultilearn.model_selection</span> <span>import</span> <span>iterative_train_test_split</span> <span>import</span> <span>numpy</span> <span>as</span> <span>np</span> <span># Assuming X is your feature matrix and y is your label matrix # X should be a numpy array or a sparse matrix # y should be a binary indicator matrix (each label is either 0 or 1) </span> <span># Define the size of your test set </span><span>test_size</span> <span>=</span> <span>0.2</span> <span># Perform the split # The function returns flattened arrays, so you need to reshape them </span><span>X_train</span><span>,</span> <span>y_train</span><span>,</span> <span>X_test</span><span>,</span> <span>y_test</span> <span>=</span> <span>iterative_train_test_split</span><span>(</span><span>X</span><span>,</span> <span>y</span><span>,</span> <span>test_size</span> <span>=</span> <span>test_size</span><span>)</span> <span># Reshape the outputs back to the original shapes </span><span>num_labels</span> <span>=</span> <span>y</span><span>.</span><span>shape</span><span>[</span><span>1</span><span>]</span> <span>y_train</span> <span>=</span> <span>y_train</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span> <span>y_test</span> <span>=</span> <span>y_test</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span>from skmultilearn.model_selection import iterative_train_test_split import numpy as np # Assuming X is your feature matrix and y is your label matrix # X should be a numpy array or a sparse matrix # y should be a binary indicator matrix (each label is either 0 or 1) # Define the size of your test set test_size = 0.2 # Perform the split # The function returns flattened arrays, so you need to reshape them X_train, y_train, X_test, y_test = iterative_train_test_split(X, y, test_size = test_size) # Reshape the outputs back to the original shapes num_labels = y.shape[1] y_train = y_train.reshape(-1, num_labels) y_test = y_test.reshape(-1, num_labels)
Enter fullscreen mode Exit fullscreen mode
But it also supports advanced problem transformations to single label problems:
<span>from</span> <span>skmultilearn.problem_transform</span> <span>import</span> <span>BinaryRelevance</span><span>from</span> <span>sklearn.svm</span> <span>import</span> <span>SVC</span><span># Initialize and train </span><span>classifier</span> <span>=</span> <span>BinaryRelevance</span><span>(</span><span>classifier</span><span>=</span><span>SVC</span><span>(),</span> <span>require_dense</span><span>=</span><span>[</span><span>False</span><span>,</span> <span>True</span><span>])</span><span>classifier</span><span>.</span><span>fit</span><span>(</span><span>X_train</span><span>,</span> <span>y_train</span><span>)</span><span># Predict </span><span>predictions</span> <span>=</span> <span>classifier</span><span>.</span><span>predict</span><span>(</span><span>X_test</span><span>)</span><span>from</span> <span>skmultilearn.problem_transform</span> <span>import</span> <span>BinaryRelevance</span> <span>from</span> <span>sklearn.svm</span> <span>import</span> <span>SVC</span> <span># Initialize and train </span><span>classifier</span> <span>=</span> <span>BinaryRelevance</span><span>(</span><span>classifier</span><span>=</span><span>SVC</span><span>(),</span> <span>require_dense</span><span>=</span><span>[</span><span>False</span><span>,</span> <span>True</span><span>])</span> <span>classifier</span><span>.</span><span>fit</span><span>(</span><span>X_train</span><span>,</span> <span>y_train</span><span>)</span> <span># Predict </span><span>predictions</span> <span>=</span> <span>classifier</span><span>.</span><span>predict</span><span>(</span><span>X_test</span><span>)</span>from skmultilearn.problem_transform import BinaryRelevance from sklearn.svm import SVC # Initialize and train classifier = BinaryRelevance(classifier=SVC(), require_dense=[False, True]) classifier.fit(X_train, y_train) # Predict predictions = classifier.predict(X_test)
Enter fullscreen mode Exit fullscreen mode
Please contribute and star the project!
I’m looking forward to your feedback, questions, and how you might use it in your projects!
暂无评论内容