I made a follow-up package, scikit-multilearn-ng, to the widely used scikit-multilearn package for multilabel classification

After needing to use scikit-multilearn and detecting errors, I opened a PR and waited. But after double checking I saw that there hadn’t been any commits in 7 months (now 9 months) and that it had not been a release since 2018, I dug in and found out that no one had access to the PyPi credentials and so on. So I opened a discussion about creating a fork and many were eager for it.

So after some developing, I’m here to introduce scikit-multilearn-ng (GitHub: https://github.com/scikit-multilearn-ng/scikit-multilearn-ng), an advanced, open-source tool for multi-label classification in Python. It’s a direct successor to scikit-multilearn and brings a host of improvements and new features.

What Makes scikit-multilearn-ng Stand Out?

Enhanced Integration with scikit-learn: This package not only integrates with the scikit-learn ecosystem but also extends its capabilities, making it a natural fit for those familiar with scikit-learn.
Expanded Algorithm Collection: Among its new offerings are StructuredGridSearchCV and the SMiLE algorithm, specifically designed for more complex multi-label classification tasks, including handling missing labels and heterogeneous features.
Open Source Philosophy: As a community-driven project, it’s free to use and open for contributions, perfect for collaborative development.

Why Should You Consider Upgrading?

Ease of Transition: For those already using scikit-multilearn, upgrading is as simple as switching the dependency to scikit-multilearn-ng. Your existing code will work without any changes.
Active Development and Support: scikit-multilearn-ng offers bug fixes and new features, ensuring your projects stay current and robust.

Whether you’re a seasoned Python developer or just starting out in machine learning, scikit-multilearn-ng is worth exploring.

Some Example Use Cases:

A simple example use case is iterative splitting multilabel data between train and test data while trying to maintain the distribution of each label between the training and test sets. This is particularly useful for datasets where certain label combinations are rare.


<span>from</span> <span>skmultilearn.model_selection</span> <span>import</span> <span>iterative_train_test_split</span>
<span>import</span> <span>numpy</span> <span>as</span> <span>np</span>
<span># Assuming X is your feature matrix and y is your label matrix # X should be a numpy array or a sparse matrix # y should be a binary indicator matrix (each label is either 0 or 1) </span>
<span># Define the size of your test set </span><span>test_size</span> <span>=</span> <span>0.2</span>
<span># Perform the split # The function returns flattened arrays, so you need to reshape them </span><span>X_train</span><span>,</span> <span>y_train</span><span>,</span> <span>X_test</span><span>,</span> <span>y_test</span> <span>=</span> <span>iterative_train_test_split</span><span>(</span><span>X</span><span>,</span> <span>y</span><span>,</span> <span>test_size</span> <span>=</span> <span>test_size</span><span>)</span>
<span># Reshape the outputs back to the original shapes </span><span>num_labels</span> <span>=</span> <span>y</span><span>.</span><span>shape</span><span>[</span><span>1</span><span>]</span>
<span>y_train</span> <span>=</span> <span>y_train</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span>
<span>y_test</span> <span>=</span> <span>y_test</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span>
<span>from</span> <span>skmultilearn.model_selection</span> <span>import</span> <span>iterative_train_test_split</span>
<span>import</span> <span>numpy</span> <span>as</span> <span>np</span>

<span># Assuming X is your feature matrix and y is your label matrix # X should be a numpy array or a sparse matrix # y should be a binary indicator matrix (each label is either 0 or 1) </span>
<span># Define the size of your test set </span><span>test_size</span> <span>=</span> <span>0.2</span>

<span># Perform the split # The function returns flattened arrays, so you need to reshape them </span><span>X_train</span><span>,</span> <span>y_train</span><span>,</span> <span>X_test</span><span>,</span> <span>y_test</span> <span>=</span> <span>iterative_train_test_split</span><span>(</span><span>X</span><span>,</span> <span>y</span><span>,</span> <span>test_size</span> <span>=</span> <span>test_size</span><span>)</span>

<span># Reshape the outputs back to the original shapes </span><span>num_labels</span> <span>=</span> <span>y</span><span>.</span><span>shape</span><span>[</span><span>1</span><span>]</span>
<span>y_train</span> <span>=</span> <span>y_train</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span>
<span>y_test</span> <span>=</span> <span>y_test</span><span>.</span><span>reshape</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>num_labels</span><span>)</span>
from skmultilearn.model_selection import iterative_train_test_split
import numpy as np

# Assuming X is your feature matrix and y is your label matrix # X should be a numpy array or a sparse matrix # y should be a binary indicator matrix (each label is either 0 or 1) 
# Define the size of your test set test_size = 0.2

# Perform the split # The function returns flattened arrays, so you need to reshape them X_train, y_train, X_test, y_test = iterative_train_test_split(X, y, test_size = test_size)

# Reshape the outputs back to the original shapes num_labels = y.shape[1]
y_train = y_train.reshape(-1, num_labels)
y_test = y_test.reshape(-1, num_labels)

Enter fullscreen mode Exit fullscreen mode

But it also supports advanced problem transformations to single label problems:


<span>from</span> <span>skmultilearn.problem_transform</span> <span>import</span> <span>BinaryRelevance</span>
<span>from</span> <span>sklearn.svm</span> <span>import</span> <span>SVC</span>
<span># Initialize and train </span><span>classifier</span> <span>=</span> <span>BinaryRelevance</span><span>(</span><span>classifier</span><span>=</span><span>SVC</span><span>(),</span> <span>require_dense</span><span>=</span><span>[</span><span>False</span><span>,</span> <span>True</span><span>])</span>
<span>classifier</span><span>.</span><span>fit</span><span>(</span><span>X_train</span><span>,</span> <span>y_train</span><span>)</span>
<span># Predict </span><span>predictions</span> <span>=</span> <span>classifier</span><span>.</span><span>predict</span><span>(</span><span>X_test</span><span>)</span>
<span>from</span> <span>skmultilearn.problem_transform</span> <span>import</span> <span>BinaryRelevance</span>
<span>from</span> <span>sklearn.svm</span> <span>import</span> <span>SVC</span>

<span># Initialize and train </span><span>classifier</span> <span>=</span> <span>BinaryRelevance</span><span>(</span><span>classifier</span><span>=</span><span>SVC</span><span>(),</span> <span>require_dense</span><span>=</span><span>[</span><span>False</span><span>,</span> <span>True</span><span>])</span>
<span>classifier</span><span>.</span><span>fit</span><span>(</span><span>X_train</span><span>,</span> <span>y_train</span><span>)</span>

<span># Predict </span><span>predictions</span> <span>=</span> <span>classifier</span><span>.</span><span>predict</span><span>(</span><span>X_test</span><span>)</span>
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.svm import SVC

# Initialize and train classifier = BinaryRelevance(classifier=SVC(), require_dense=[False, True])
classifier.fit(X_train, y_train)

# Predict predictions = classifier.predict(X_test)

Enter fullscreen mode Exit fullscreen mode

Please contribute and star the project!

I’m looking forward to your feedback, questions, and how you might use it in your projects!

原文链接：I made a follow-up package, scikit-multilearn-ng, to the widely used scikit-multilearn package for multilabel classification

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END