Web scraping with PyQuery

How do you do webscraping these days? You may be working with beautifulsoup or automate the web browser with selenium.

If you have very basic scraping needs, you could consider pyquery.

pyquery allows you to make jquery queries on xml documents. That’s great, because you can use it on HTML.

First intsall PyQuery with pip. Then you can use it like this:

#!/usr/bin/python3
from pyquery import PyQuery as pq
doc =pq(url = "https://pythonbasics.org")
print( doc('title').text() )
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://pythonbasics.org")
print( doc('title').text() )
#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://pythonbasics.org") print( doc('title').text() )

Enter fullscreen mode Exit fullscreen mode

That will grab the title from the web page.
Want to get all links from a web page?

#!/usr/bin/python3
from pyquery import PyQuery as pq
doc =pq(url = "https://dev.to")
for link in doc('a'):
print(link.attrib['href'])
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('a'):
    print(link.attrib['href'])
#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://dev.to") for link in doc('a'): print(link.attrib['href'])

Enter fullscreen mode Exit fullscreen mode

Easy right?

Do you prefer getting images?

#!/usr/bin/python3
from pyquery import PyQuery as pq
doc =pq(url = "https://dev.to")
for link in doc('img'):
print(link.attrib['src'])
#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('img'):
    print(link.attrib['src'])
#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://dev.to") for link in doc('img'): print(link.attrib['src'])

Enter fullscreen mode Exit fullscreen mode

Related links:

原文链接:Web scraping with PyQuery

© 版权声明
THE END
喜欢就支持一下吧
点赞6 分享
In the face of difficulties, be brave, persistent and tirelessly to overcome it.
面对困难的时候,要勇敢、执着、不畏艰辛地去战胜它
评论 抢沙发

请登录后发表评论

    暂无评论内容