How do you do webscraping these days? You may be working with beautifulsoup or automate the web browser with selenium.
If you have very basic scraping needs, you could consider pyquery.
pyquery allows you to make jquery queries on xml documents. That’s great, because you can use it on HTML.
First intsall PyQuery with pip. Then you can use it like this:
#!/usr/bin/python3from pyquery import PyQuery as pqdoc =pq(url = "https://pythonbasics.org")print( doc('title').text() )#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://pythonbasics.org") print( doc('title').text() )#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://pythonbasics.org") print( doc('title').text() )
Enter fullscreen mode Exit fullscreen mode
That will grab the title from the web page.
Want to get all links from a web page?
#!/usr/bin/python3from pyquery import PyQuery as pqdoc =pq(url = "https://dev.to")for link in doc('a'):print(link.attrib['href'])#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://dev.to") for link in doc('a'): print(link.attrib['href'])#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://dev.to") for link in doc('a'): print(link.attrib['href'])
Enter fullscreen mode Exit fullscreen mode
Easy right?
Do you prefer getting images?
#!/usr/bin/python3from pyquery import PyQuery as pqdoc =pq(url = "https://dev.to")for link in doc('img'):print(link.attrib['src'])#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://dev.to") for link in doc('img'): print(link.attrib['src'])#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://dev.to") for link in doc('img'): print(link.attrib['src'])
Enter fullscreen mode Exit fullscreen mode
Related links:
© 版权声明
THE END
暂无评论内容