Hello 😀
I love the memes and I want to keep them in my phone, I think the solution is to browse through the meme and download it manually?
nope let’s steal/download some memes automatic with python
what the website we will steal memes from it ?
our target is https://imgflip.com/
first let’s look at html page
<span><div</span> <span>class=</span><span>"name"</span><span>></span>BLA BLABBLAB LBABL BLA<span><img</span> <span>src=</span><span>"MEME URL"</span><span>></span><span><div</span> <span>class=</span><span>"name"</span><span>></span> BLA BLAB BLAB LBA BL BLA <span><img</span> <span>src=</span><span>"MEME URL"</span><span>></span><div class="name"> BLA BLAB BLAB LBA BL BLA <img src="MEME URL">
Enter fullscreen mode Exit fullscreen mode
all memes links in <div class="base-img-wrap">.......</div>
here we need to parse div tag with base-img-wrap
class name and get <img>
tag in this div
<span><div</span> <span>class=</span><span>'base-img-wrap'</span><span>></span>BLA BLA LBA<span><img</span> <span>src=</span><span>"MEME LINK"</span><span>></span><span></div></span><span><div</span> <span>class=</span><span>'base-img-wrap'</span><span>></span> BLA BLA LBA <span><img</span> <span>src=</span><span>"MEME LINK"</span><span>></span> <span></div></span><div class='base-img-wrap'> BLA BLA LBA <img src="MEME LINK"> </div>
Enter fullscreen mode Exit fullscreen mode
Modules we need
- requests (for http/s requests)
- bs4 (html parsing)
let’s start our work with send http request to this site and parsing base-img-wrap
class
<span>import</span> <span>requests</span><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span><span>req</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span><span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span><span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span><span>""" <div class="base-unit clearfix"><h2 class="base-unit-title"><a href="/i/5aq7jq">Why is my sister's name Rose</a></h2><div class="base-img-wrap-wrap"><div class="base-img-wrap" style="width:440px"><a class="base-img-link" href="/i/5aq7jq" style="padding-bottom:105.90909090909%"> ...... """</span><span>import</span> <span>requests</span> <span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>req</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span> <span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span> <span>""" <div class="base-unit clearfix"><h2 class="base-unit-title"><a href="/i/5aq7jq">Why is my sister's name Rose</a></h2><div class="base-img-wrap-wrap"><div class="base-img-wrap" style="width:440px"><a class="base-img-link" href="/i/5aq7jq" style="padding-bottom:105.90909090909%"> ...... """</span>import requests from bs4 import BeautifulSoup req = requests.get('https://imgflip.com/?page=1').content soup = BeautifulSoup(req, "html.parser") ancher = soup.find_all('div', {'class': "base-unit clearfix"}) """ <div class="base-unit clearfix"><h2 class="base-unit-title"><a href="/i/5aq7jq">Why is my sister's name Rose</a></h2><div class="base-img-wrap-wrap"><div class="base-img-wrap" style="width:440px"><a class="base-img-link" href="/i/5aq7jq" style="padding-bottom:105.90909090909%"> ...... """
Enter fullscreen mode Exit fullscreen mode
We have fetched all the data of <div class='base-img-wrap'>
let’s get img tag
<span>import</span> <span>requests</span><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span><span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span><span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span><span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span><span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span><span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span><span>if</span> <span>img</span><span>:</span><span>print</span><span>(</span><span>img</span><span>)</span><span>import</span> <span>requests</span> <span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span> <span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span> <span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span> <span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span> <span>if</span> <span>img</span><span>:</span> <span>print</span><span>(</span><span>img</span><span>)</span>import requests from bs4 import BeautifulSoup r = requests.get('https://imgflip.com/?page=1').content soup = BeautifulSoup(req, "html.parser") ancher = soup.find_all('div', {'class': "base-unit clearfix"}) for pt in ancher: img = pt.find('img', {'class': 'base-img'}) if img: print(img)
Enter fullscreen mode Exit fullscreen mode
<span><img</span> <span>alt=</span><span>"Why is my sister's name Rose | people that upvote good memes instead of just scrolling past them | image tagged in why is my sister's name rose | made w/ Imgflip meme maker"</span> <span>class=</span><span>"base-img"</span> <span>src=</span><span>"//i.imgflip.com/5aq7jq.jpg"</span><span>/></span><span><img</span> <span>alt=</span><span>"Petition: upvote if you want a rule against upvote begging. I will then post the results in the Imgflip suggestion stream | Upvote begging will keep happening as long as they make it to the front page; UPVOTE BEGGING TO DESTROY UPVOTE BEGGING | image tagged in memes,the scroll of truth,no no hes got a point,you have become the very thing you swore to destroy,memes | made w/ Imgflip meme maker"</span> <span>class=</span><span>"base-img"</span> <span>src=</span><span>"//i.imgflip.com/5aqvx4.jpg"</span><span>/></span><span><img</span> <span>alt=</span><span>"Why is my sister's name Rose | people that upvote good memes instead of just scrolling past them | image tagged in why is my sister's name rose | made w/ Imgflip meme maker"</span> <span>class=</span><span>"base-img"</span> <span>src=</span><span>"//i.imgflip.com/5aq7jq.jpg"</span><span>/></span> <span><img</span> <span>alt=</span><span>"Petition: upvote if you want a rule against upvote begging. I will then post the results in the Imgflip suggestion stream | Upvote begging will keep happening as long as they make it to the front page; UPVOTE BEGGING TO DESTROY UPVOTE BEGGING | image tagged in memes,the scroll of truth,no no hes got a point,you have become the very thing you swore to destroy,memes | made w/ Imgflip meme maker"</span> <span>class=</span><span>"base-img"</span> <span>src=</span><span>"//i.imgflip.com/5aqvx4.jpg"</span><span>/></span><img alt="Why is my sister's name Rose | people that upvote good memes instead of just scrolling past them | image tagged in why is my sister's name rose | made w/ Imgflip meme maker" class="base-img" src="//i.imgflip.com/5aq7jq.jpg"/> <img alt="Petition: upvote if you want a rule against upvote begging. I will then post the results in the Imgflip suggestion stream | Upvote begging will keep happening as long as they make it to the front page; UPVOTE BEGGING TO DESTROY UPVOTE BEGGING | image tagged in memes,the scroll of truth,no no hes got a point,you have become the very thing you swore to destroy,memes | made w/ Imgflip meme maker" class="base-img" src="//i.imgflip.com/5aqvx4.jpg"/>
Enter fullscreen mode Exit fullscreen mode
cool , know we have all img tag know we need get src value
<span>import</span> <span>requests</span><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span><span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span><span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span><span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span><span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span><span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span><span>if</span> <span>img</span><span>:</span><span>link</span> <span>=</span> <span>img</span><span>[</span><span>'src'</span><span>].</span><span>replace</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>][</span><span>0</span><span>:</span><span>2</span><span>],</span><span>'https://'</span><span>)</span><span>print</span><span>(</span><span>link</span><span>)</span><span>""" https://i.imgflip.com/5aq7jq.jpg https://i.imgflip.com/5aqvx4.jpg https://i.imgflip.com/5aq5jg.jpg https://i.imgflip.com/5aor2n.jpg https://i.imgflip.com/5amt83.jpg https://i.imgflip.com/5ayodd.jpg https://i.imgflip.com/5awhgz.jpg https://i.imgflip.com/5allij.jpg https://i.imgflip.com/5aosh7.jpg https://i.imgflip.com/5amxbo.jpg https://i.imgflip.com/5auvpo.jpg """</span><span>import</span> <span>requests</span> <span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span> <span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span> <span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span> <span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span> <span>if</span> <span>img</span><span>:</span> <span>link</span> <span>=</span> <span>img</span><span>[</span><span>'src'</span><span>].</span><span>replace</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>][</span><span>0</span><span>:</span><span>2</span><span>],</span><span>'https://'</span><span>)</span> <span>print</span><span>(</span><span>link</span><span>)</span> <span>""" https://i.imgflip.com/5aq7jq.jpg https://i.imgflip.com/5aqvx4.jpg https://i.imgflip.com/5aq5jg.jpg https://i.imgflip.com/5aor2n.jpg https://i.imgflip.com/5amt83.jpg https://i.imgflip.com/5ayodd.jpg https://i.imgflip.com/5awhgz.jpg https://i.imgflip.com/5allij.jpg https://i.imgflip.com/5aosh7.jpg https://i.imgflip.com/5amxbo.jpg https://i.imgflip.com/5auvpo.jpg """</span>import requests from bs4 import BeautifulSoup r = requests.get('https://imgflip.com/?page=1').content soup = BeautifulSoup(req, "html.parser") ancher = soup.find_all('div', {'class': "base-unit clearfix"}) for pt in ancher: img = pt.find('img', {'class': 'base-img'}) if img: link = img['src'].replace(img['src'][0:2],'https://') print(link) """ https://i.imgflip.com/5aq7jq.jpg https://i.imgflip.com/5aqvx4.jpg https://i.imgflip.com/5aq5jg.jpg https://i.imgflip.com/5aor2n.jpg https://i.imgflip.com/5amt83.jpg https://i.imgflip.com/5ayodd.jpg https://i.imgflip.com/5awhgz.jpg https://i.imgflip.com/5allij.jpg https://i.imgflip.com/5aosh7.jpg https://i.imgflip.com/5amxbo.jpg https://i.imgflip.com/5auvpo.jpg """
Enter fullscreen mode Exit fullscreen mode
after get all images we will download it with requests module and save it
<span>import</span> <span>requests</span><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span><span>req</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span><span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span><span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span><span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span><span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span><span>if</span> <span>img</span><span>:</span><span>link</span> <span>=</span> <span>img</span><span>[</span><span>'src'</span><span>].</span><span>replace</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>][</span><span>0</span><span>:</span><span>2</span><span>],</span><span>'https://'</span><span>)</span><span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>link</span><span>)</span><span>f</span> <span>=</span> <span>open</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>].</span><span>split</span><span>(</span><span>'/'</span><span>)[</span><span>3</span><span>],</span><span>'wb'</span><span>)</span> <span># write binary </span> <span>f</span><span>.</span><span>write</span><span>(</span><span>r</span><span>.</span><span>content</span><span>)</span><span>f</span><span>.</span><span>close</span><span>()</span><span>import</span> <span>requests</span> <span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>req</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>'https://imgflip.com/?page=1'</span><span>).</span><span>content</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span> <span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span> <span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span> <span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span> <span>if</span> <span>img</span><span>:</span> <span>link</span> <span>=</span> <span>img</span><span>[</span><span>'src'</span><span>].</span><span>replace</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>][</span><span>0</span><span>:</span><span>2</span><span>],</span><span>'https://'</span><span>)</span> <span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>link</span><span>)</span> <span>f</span> <span>=</span> <span>open</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>].</span><span>split</span><span>(</span><span>'/'</span><span>)[</span><span>3</span><span>],</span><span>'wb'</span><span>)</span> <span># write binary </span> <span>f</span><span>.</span><span>write</span><span>(</span><span>r</span><span>.</span><span>content</span><span>)</span> <span>f</span><span>.</span><span>close</span><span>()</span>import requests from bs4 import BeautifulSoup req = requests.get('https://imgflip.com/?page=1').content soup = BeautifulSoup(req, "html.parser") ancher = soup.find_all('div', {'class': "base-unit clearfix"}) for pt in ancher: img = pt.find('img', {'class': 'base-img'}) if img: link = img['src'].replace(img['src'][0:2],'https://') r = requests.get(link) f = open(img['src'].split('/')[3],'wb') # write binary f.write(r.content) f.close()
Enter fullscreen mode Exit fullscreen mode
great , we get all the memes of page number 1
let’s add parameter for page
in url
<span>import</span> <span>requests</span><span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span><span>def</span> <span>meme_stealer</span><span>(</span><span>page</span><span>):</span><span>req</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>f</span><span>'https://imgflip.com/?page=</span><span>{</span><span>page</span><span>}</span><span>'</span><span>).</span><span>content</span><span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span><span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span><span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span><span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span><span>if</span> <span>img</span><span>:</span><span>link</span> <span>=</span> <span>img</span><span>[</span><span>'src'</span><span>].</span><span>replace</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>][</span><span>0</span><span>:</span><span>2</span><span>],</span><span>'https://'</span><span>)</span><span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>link</span><span>)</span><span>f</span> <span>=</span> <span>open</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>].</span><span>split</span><span>(</span><span>'/'</span><span>)[</span><span>3</span><span>],</span><span>'wb'</span><span>)</span><span>f</span><span>.</span><span>write</span><span>(</span><span>r</span><span>.</span><span>content</span><span>)</span><span>f</span><span>.</span><span>close</span><span>()</span><span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>1</span><span>,</span><span>6</span><span>):</span><span>meme_stealer</span><span>(</span><span>i</span><span>)</span><span># Page 1 # Page 2 # Page 3 # Page 4 # Page 5 </span><span>import</span> <span>requests</span> <span>from</span> <span>bs4</span> <span>import</span> <span>BeautifulSoup</span> <span>def</span> <span>meme_stealer</span><span>(</span><span>page</span><span>):</span> <span>req</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>f</span><span>'https://imgflip.com/?page=</span><span>{</span><span>page</span><span>}</span><span>'</span><span>).</span><span>content</span> <span>soup</span> <span>=</span> <span>BeautifulSoup</span><span>(</span><span>req</span><span>,</span> <span>"html.parser"</span><span>)</span> <span>ancher</span> <span>=</span> <span>soup</span><span>.</span><span>find_all</span><span>(</span><span>'div'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>"base-unit clearfix"</span><span>})</span> <span>for</span> <span>pt</span> <span>in</span> <span>ancher</span><span>:</span> <span>img</span> <span>=</span> <span>pt</span><span>.</span><span>find</span><span>(</span><span>'img'</span><span>,</span> <span>{</span><span>'class'</span><span>:</span> <span>'base-img'</span><span>})</span> <span>if</span> <span>img</span><span>:</span> <span>link</span> <span>=</span> <span>img</span><span>[</span><span>'src'</span><span>].</span><span>replace</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>][</span><span>0</span><span>:</span><span>2</span><span>],</span><span>'https://'</span><span>)</span> <span>r</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>link</span><span>)</span> <span>f</span> <span>=</span> <span>open</span><span>(</span><span>img</span><span>[</span><span>'src'</span><span>].</span><span>split</span><span>(</span><span>'/'</span><span>)[</span><span>3</span><span>],</span><span>'wb'</span><span>)</span> <span>f</span><span>.</span><span>write</span><span>(</span><span>r</span><span>.</span><span>content</span><span>)</span> <span>f</span><span>.</span><span>close</span><span>()</span> <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>1</span><span>,</span><span>6</span><span>):</span> <span>meme_stealer</span><span>(</span><span>i</span><span>)</span> <span># Page 1 # Page 2 # Page 3 # Page 4 # Page 5 </span>import requests from bs4 import BeautifulSoup def meme_stealer(page): req = requests.get(f'https://imgflip.com/?page={page}').content soup = BeautifulSoup(req, "html.parser") ancher = soup.find_all('div', {'class': "base-unit clearfix"}) for pt in ancher: img = pt.find('img', {'class': 'base-img'}) if img: link = img['src'].replace(img['src'][0:2],'https://') r = requests.get(link) f = open(img['src'].split('/')[3],'wb') f.write(r.content) f.close() for i in range(1,6): meme_stealer(i) # Page 1 # Page 2 # Page 3 # Page 4 # Page 5
Enter fullscreen mode Exit fullscreen mode
Thanks for reading this
Bye 😀
暂无评论内容