Python ML – Similarity with spaCy

Intro

I started learning ML and Data Science, and I want share with you a powerful python library for semantic text analysis called spaCy.

Description

Similarity is determined by comparing word vectors or “word embeddings”, multi-dimensional meaning representations of a word.

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It’s easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.

spaCy is able to compare two objects, and make a prediction of how similar they are. Predicting similarity is useful for building recommendation systems or flagging duplicates. For example, you can suggest a user content that’s similar to what they’re currently looking at, or label a support ticket as a duplicate if it’s very similar to an already existing one.

Installation instructions

pip

Using pip, spaCy releases are available as source packages and binary wheels (as of v2.0.13).

# pip3 install spacy
# pip3 install spacy
# pip3 install spacy

Enter fullscreen mode Exit fullscreen mode

Models

spaCy’s models can be installed as Python packages. This means that they’re a component of your application, just like any other module. They’re versioned and can be defined as a dependency in your requirements.txt. Models can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system.

python3 -m spacy download en
python3 -m spacy download en
python3 -m spacy download en

Enter fullscreen mode Exit fullscreen mode

spaCy currently provides support for the following languages. Here is a complete list.

Example

here is a little python code that compare two string.

<span>import</span> <span>spacy</span>
<span>nlp</span> <span>=</span> <span>spacy</span><span>.</span><span>load</span><span>(</span><span>"</span><span>en_core_web_sm</span><span>"</span><span>)</span>
<span>first_text</span> <span>=</span> <span>nlp</span><span>(</span><span>input</span><span>(</span><span>"</span><span>insert first text: </span><span>"</span><span>))</span>
<span>second_text</span> <span>=</span> <span>nlp</span><span>(</span><span>input</span><span>(</span><span>"</span><span>insert second text: </span><span>"</span><span>))</span>
<span>print</span><span>(</span><span>f</span><span>"</span><span>similarity: </span><span>{</span><span>first_text</span><span>.</span><span>similarity</span><span>(</span><span>second_text</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>import</span> <span>spacy</span>

<span>nlp</span> <span>=</span> <span>spacy</span><span>.</span><span>load</span><span>(</span><span>"</span><span>en_core_web_sm</span><span>"</span><span>)</span>

<span>first_text</span> <span>=</span> <span>nlp</span><span>(</span><span>input</span><span>(</span><span>"</span><span>insert first text: </span><span>"</span><span>))</span>
<span>second_text</span> <span>=</span> <span>nlp</span><span>(</span><span>input</span><span>(</span><span>"</span><span>insert second text: </span><span>"</span><span>))</span>

<span>print</span><span>(</span><span>f</span><span>"</span><span>similarity: </span><span>{</span><span>first_text</span><span>.</span><span>similarity</span><span>(</span><span>second_text</span><span>)</span><span>}</span><span>"</span><span>)</span>
import spacy nlp = spacy.load("en_core_web_sm") first_text = nlp(input("insert first text: ")) second_text = nlp(input("insert second text: ")) print(f"similarity: {first_text.similarity(second_text)}")

Enter fullscreen mode Exit fullscreen mode

result

insert first text: i'm a software developer
insert second text: i'm a software web developer
similarity: 0.9302790237853475
insert first text: i'm a software developer
insert second text: i'm a software web developer

similarity: 0.9302790237853475
insert first text: i'm a software developer insert second text: i'm a software web developer similarity: 0.9302790237853475

Enter fullscreen mode Exit fullscreen mode

Enjoy!!

原文链接:Python ML – Similarity with spaCy

© 版权声明
THE END
喜欢就支持一下吧
点赞15 分享
No matter what happened in the past, you have to believe that the best is yet to come.
无论过去发生过什么,你都要相信,最好的尚未到来
评论 抢沙发

请登录后发表评论

    暂无评论内容