Web scraping with PyQuery

How do you do webscraping these days? You may be working with beautifulsoup or automate the web browser with selenium.

If you have very basic scraping needs, you could consider pyquery.

pyquery allows you to make jquery queries on xml documents. That’s great, because you can use it on HTML.

First intsall PyQuery with pip. Then you can use it like this:


#!/usr/bin/python3
from pyquery import PyQuery as pq
doc =pq(url = "https://pythonbasics.org")
print( doc('title').text() )
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://pythonbasics.org")
print( doc('title').text() )
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://pythonbasics.org")
print( doc('title').text() )

Enter fullscreen mode Exit fullscreen mode

That will grab the title from the web page.
Want to get all links from a web page?


#!/usr/bin/python3
from pyquery import PyQuery as pq
doc =pq(url = "https://dev.to")
for link in doc('a'):
    print(link.attrib['href'])
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('a'):
    print(link.attrib['href'])
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('a'):
    print(link.attrib['href'])

Enter fullscreen mode Exit fullscreen mode

Easy right?

Do you prefer getting images?


#!/usr/bin/python3
from pyquery import PyQuery as pq
doc =pq(url = "https://dev.to")
for link in doc('img'):
    print(link.attrib['src'])
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('img'):
    print(link.attrib['src'])
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('img'):
    print(link.attrib['src'])

Enter fullscreen mode Exit fullscreen mode

Related links:

原文链接：Web scraping with PyQuery

展开阅读全文

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END