If you wanted to delete your old tweets within a given date range, this post will walk you through the steps using python-twitter & data obtained from your twitter profile
GitHub Link of Jupyter Notebook
Prerequisite:-
1. Twitter data archive:
We will require twitter data to get two attributes of each tweet:-
I) Tweet Id and
II) Tweet date
If you already have these two items in any csv file or any other form then you can skip this step.
To get you twitter data follow this link using desktop site
OR
alternatively you can reach it from Settings & Privacy -> Account -> Download an archive of your twitter data (After request it will take up to 48 Hrs to make your data available for download)
2. Twitter API Credentials
Go to https://developer.twitter.com/en/apps if you have no existing app then create one.
After your app has been created go to it’s “Details”, make sure the permissions are assigned to Read, Write and Direct Messages , otherwise edit and make the changes
After that you need to generate the Consumer API key and Access Tokens
Handling the Downloaded Twitter Data:-
After extracting the twitter data you will find two folders one is data and other is assets. The HTML file can be used to browse your tweets offline.
Before we use the code we need to make some changes in the tweets.js file located inside assets folder
Current twitter data download service lists all your tweet assets inside a tweet.js file as an array of objects. Since we will want this file to be read in python, so we need to make it a JSON file, hence remove the first line
window.YTD.tweet.part0 = [ {window.YTD.tweet.part0 = [ {window.YTD.tweet.part0 = [ {
Enter fullscreen mode Exit fullscreen mode
and replace it with
{"data": [ {{"data": [ {{"data": [ {
Enter fullscreen mode Exit fullscreen mode
for last line you need to add an extra “}” to make it a JSON object, hence your file’s last line will look like this:-
} ]}} ]}} ]}
Enter fullscreen mode Exit fullscreen mode
save it as a JSON file, for example “editedTweet.js”
Code:-
The python jupyter notebook can be found at GitHub with this link
1) First make sure to uninstall standalone twitter package and install the python version of that (since the normal twitter package doesn’t includes the “twitter.Api()“ method), you can do it directly from Jupyter Notebook (you will need to restart notebook after installation)
<span>!</span><span>pip</span> <span>uninstall</span> <span>twitter</span><span>!</span><span>pip</span> <span>install</span> <span>python</span><span>-</span><span>twitter</span><span>!</span><span>pip</span> <span>uninstall</span> <span>twitter</span> <span>!</span><span>pip</span> <span>install</span> <span>python</span><span>-</span><span>twitter</span>!pip uninstall twitter !pip install python-twitter
Enter fullscreen mode Exit fullscreen mode
OR from terminal using
pip uninstall twitterpip <span>install </span>python-twitterpip uninstall twitter pip <span>install </span>python-twitterpip uninstall twitter pip install python-twitter
Enter fullscreen mode Exit fullscreen mode
2) Initialize the CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN_KEY,
ACCESS_TOKEN_SECRET variables with your own twitter API credentials, passing tweet id string into “deleteTweet(tweetId)” function will delete that tweet. The tweet id is located in the JSON file as “id_str” for each tweet.
<span># ================================================================== # Import statements # ================================================================== </span><span>import</span> <span>sys</span><span>import</span> <span>time</span><span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span><span>import</span> <span>os</span><span>import</span> <span>twitter</span><span>from</span> <span>dateutil.parser</span> <span>import</span> <span>parse</span><span>import</span> <span>numpy</span> <span>as</span> <span>np</span><span>import</span> <span>pandas</span> <span>as</span> <span>pd</span><span>import</span> <span>json</span><span># ================================================================== # API Credentials # ================================================================== </span><span>CONSUMER_KEY</span> <span>=</span> <span>""</span><span>CONSUMER_SECRET</span> <span>=</span> <span>""</span><span>ACCESS_TOKEN_KEY</span> <span>=</span> <span>""</span><span>ACCESS_TOKEN_SECRET</span> <span>=</span> <span>""</span><span># ================================================================== # Initialize # ================================================================== </span><span>api</span> <span>=</span> <span>twitter</span><span>.</span><span>Api</span><span>(</span><span>consumer_key</span> <span>=</span> <span>CONSUMER_KEY</span><span>,</span><span>consumer_secret</span> <span>=</span> <span>CONSUMER_SECRET</span><span>,</span><span>access_token_key</span> <span>=</span> <span>ACCESS_TOKEN_KEY</span><span>,</span><span>access_token_secret</span> <span>=</span> <span>ACCESS_TOKEN_SECRET</span><span>)</span><span># ====================================================================================== # Function to delete tweet by ID # ====================================================================================== </span><span>def</span> <span>deleteTweet</span><span>(</span><span>tweetId</span><span>):</span><span>try</span><span>:</span><span>print</span><span>(</span><span>"</span><span>Deleting tweet #{0})</span><span>"</span><span>.</span><span>format</span><span>(</span><span>tweetId</span><span>))</span><span>api</span><span>.</span><span>DestroyStatus</span><span>(</span><span>tweetId</span><span>)</span><span>print</span><span>(</span><span>"</span><span>Deleted</span><span>"</span><span>)</span><span>except</span> <span>Exception</span> <span>as</span> <span>err</span><span>:</span><span>print</span><span>(</span><span>"</span><span>Exception: %s</span><span>\n</span><span>"</span> <span>%</span> <span>err</span><span>)</span><span># ================================================================== # Import statements # ================================================================== </span> <span>import</span> <span>sys</span> <span>import</span> <span>time</span> <span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span> <span>import</span> <span>os</span> <span>import</span> <span>twitter</span> <span>from</span> <span>dateutil.parser</span> <span>import</span> <span>parse</span> <span>import</span> <span>numpy</span> <span>as</span> <span>np</span> <span>import</span> <span>pandas</span> <span>as</span> <span>pd</span> <span>import</span> <span>json</span> <span># ================================================================== # API Credentials # ================================================================== </span> <span>CONSUMER_KEY</span> <span>=</span> <span>""</span> <span>CONSUMER_SECRET</span> <span>=</span> <span>""</span> <span>ACCESS_TOKEN_KEY</span> <span>=</span> <span>""</span> <span>ACCESS_TOKEN_SECRET</span> <span>=</span> <span>""</span> <span># ================================================================== # Initialize # ================================================================== </span> <span>api</span> <span>=</span> <span>twitter</span><span>.</span><span>Api</span><span>(</span><span>consumer_key</span> <span>=</span> <span>CONSUMER_KEY</span><span>,</span> <span>consumer_secret</span> <span>=</span> <span>CONSUMER_SECRET</span><span>,</span> <span>access_token_key</span> <span>=</span> <span>ACCESS_TOKEN_KEY</span><span>,</span> <span>access_token_secret</span> <span>=</span> <span>ACCESS_TOKEN_SECRET</span><span>)</span> <span># ====================================================================================== # Function to delete tweet by ID # ====================================================================================== </span> <span>def</span> <span>deleteTweet</span><span>(</span><span>tweetId</span><span>):</span> <span>try</span><span>:</span> <span>print</span><span>(</span><span>"</span><span>Deleting tweet #{0})</span><span>"</span><span>.</span><span>format</span><span>(</span><span>tweetId</span><span>))</span> <span>api</span><span>.</span><span>DestroyStatus</span><span>(</span><span>tweetId</span><span>)</span> <span>print</span><span>(</span><span>"</span><span>Deleted</span><span>"</span><span>)</span> <span>except</span> <span>Exception</span> <span>as</span> <span>err</span><span>:</span> <span>print</span><span>(</span><span>"</span><span>Exception: %s</span><span>\n</span><span>"</span> <span>%</span> <span>err</span><span>)</span># ================================================================== # Import statements # ================================================================== import sys import time from datetime import datetime import os import twitter from dateutil.parser import parse import numpy as np import pandas as pd import json # ================================================================== # API Credentials # ================================================================== CONSUMER_KEY = "" CONSUMER_SECRET = "" ACCESS_TOKEN_KEY = "" ACCESS_TOKEN_SECRET = "" # ================================================================== # Initialize # ================================================================== api = twitter.Api(consumer_key = CONSUMER_KEY, consumer_secret = CONSUMER_SECRET, access_token_key = ACCESS_TOKEN_KEY, access_token_secret = ACCESS_TOKEN_SECRET) # ====================================================================================== # Function to delete tweet by ID # ====================================================================================== def deleteTweet(tweetId): try: print("Deleting tweet #{0})".format(tweetId)) api.DestroyStatus(tweetId) print("Deleted") except Exception as err: print("Exception: %s\n" % err)
Enter fullscreen mode Exit fullscreen mode
3) Read the JSON file into a json variable in python
<span>myData</span> <span>=</span> <span>None</span><span>with</span> <span>open</span><span>(</span><span>'</span><span>editedTweet.json</span><span>'</span><span>)</span> <span>as</span> <span>json_file</span><span>:</span><span>myData</span> <span>=</span> <span>json</span><span>.</span><span>load</span><span>(</span><span>json_file</span><span>)</span><span>myData</span> <span>=</span> <span>None</span> <span>with</span> <span>open</span><span>(</span><span>'</span><span>editedTweet.json</span><span>'</span><span>)</span> <span>as</span> <span>json_file</span><span>:</span> <span>myData</span> <span>=</span> <span>json</span><span>.</span><span>load</span><span>(</span><span>json_file</span><span>)</span>myData = None with open('editedTweet.json') as json_file: myData = json.load(json_file)
Enter fullscreen mode Exit fullscreen mode
you can now browse the tweet attributes for each tweet in the array, for reference I have printed contents of element located at index 0, here we are only interested in “created_at” and “id_str” attributes:-
{'tweet': {'created_at': 'Thu Sep 11 12:26:39 +0000 2014','display_text_range': ['0', '137'],'entities': {'hashtags': [],'symbols': [],'urls': [{'display_url': 'fb.me/3oL0wLoge','expanded_url': 'http://fb.me/3oL0wLoge','indices': ['115', '137'],'url': 'http://t.co/spMVNltxDk'}],'user_mentions': []},'favorite_count': '0','favorited': False,'full_text': 'for galaxy y , galaxy pocket, galaxy ace, galaxy music, galaxy y dous lite and any other low end android device... http://t.co/spMVNltxDk','id': '510041733099712513','id_str': '510041733099712513','lang': 'en','possibly_sensitive': False,'retweet_count': '0','retweeted': False,'source': '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>','truncated': False}}{'tweet': {'created_at': 'Thu Sep 11 12:26:39 +0000 2014', 'display_text_range': ['0', '137'], 'entities': {'hashtags': [], 'symbols': [], 'urls': [{'display_url': 'fb.me/3oL0wLoge', 'expanded_url': 'http://fb.me/3oL0wLoge', 'indices': ['115', '137'], 'url': 'http://t.co/spMVNltxDk'}], 'user_mentions': []}, 'favorite_count': '0', 'favorited': False, 'full_text': 'for galaxy y , galaxy pocket, galaxy ace, galaxy music, galaxy y dous lite and any other low end android device... http://t.co/spMVNltxDk', 'id': '510041733099712513', 'id_str': '510041733099712513', 'lang': 'en', 'possibly_sensitive': False, 'retweet_count': '0', 'retweeted': False, 'source': '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>', 'truncated': False}}{'tweet': {'created_at': 'Thu Sep 11 12:26:39 +0000 2014', 'display_text_range': ['0', '137'], 'entities': {'hashtags': [], 'symbols': [], 'urls': [{'display_url': 'fb.me/3oL0wLoge', 'expanded_url': 'http://fb.me/3oL0wLoge', 'indices': ['115', '137'], 'url': 'http://t.co/spMVNltxDk'}], 'user_mentions': []}, 'favorite_count': '0', 'favorited': False, 'full_text': 'for galaxy y , galaxy pocket, galaxy ace, galaxy music, galaxy y dous lite and any other low end android device... http://t.co/spMVNltxDk', 'id': '510041733099712513', 'id_str': '510041733099712513', 'lang': 'en', 'possibly_sensitive': False, 'retweet_count': '0', 'retweeted': False, 'source': '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>', 'truncated': False}}
Enter fullscreen mode Exit fullscreen mode
4) We now just need to create an array with id of tweets which needs to be deleted, hence select the date range for tweets which should be deleted, remember to keep the UTC offset into consideration. Use “range_start” and “range_end”.
<span># Range (in UTC offset) within which tweets will be deleted # ================================================================= </span><span>range_start</span> <span>=</span> <span>datetime</span><span>.</span><span>strptime</span><span>(</span><span>'</span><span>Sep 10 00:00:00 +0000 2012</span><span>'</span><span>,</span><span>'</span><span>%b %d %H:%M:%S %z %Y</span><span>'</span><span>)</span><span>range_end</span> <span>=</span> <span>datetime</span><span>.</span><span>strptime</span><span>(</span><span>'</span><span>Sep 10 00:00:00 +0000 2017</span><span>'</span><span>,</span><span>'</span><span>%b %d %H:%M:%S %z %Y</span><span>'</span><span>)</span><span># ================================================================== # I am creating a list of tweet IDs for consideration, where tweetsToBeDeleted will be # used for deleting tweet # ================================================================== </span><span>tweetsToBeDeleted</span> <span>=</span> <span>[]</span><span>tweetsToBeIgnored</span> <span>=</span> <span>[]</span><span>for</span> <span>element</span> <span>in</span> <span>myData</span><span>[</span><span>"</span><span>data</span><span>"</span><span>]:</span><span>tweet_post_time</span> <span>=</span> <span>datetime</span><span>.</span><span>strptime</span><span>(</span><span>element</span><span>[</span><span>"</span><span>tweet</span><span>"</span><span>][</span><span>"</span><span>created_at</span><span>"</span><span>],</span><span>'</span><span>%a %b %d %H:%M:%S %z %Y</span><span>'</span><span>)</span><span>if </span><span>(</span><span>tweet_post_time</span><span>>=</span> <span>range_start</span> <span>and</span> <span>tweet_post_time</span><span><=</span> <span>range_end</span> <span>):</span><span>tweetsToBeDeleted</span><span>.</span><span>append</span><span>(</span><span>element</span><span>[</span><span>"</span><span>tweet</span><span>"</span><span>][</span><span>"</span><span>id_str</span><span>"</span><span>])</span><span>else</span><span>:</span><span>tweetsToBeIgnored</span><span>.</span><span>append</span><span>(</span><span>element</span><span>[</span><span>"</span><span>tweet</span><span>"</span><span>][</span><span>"</span><span>id_str</span><span>"</span><span>])</span><span>print</span><span>(</span><span>len</span><span>(</span><span>tweetsToBeDeleted</span><span>),</span><span>len</span><span>(</span><span>tweetsToBeIgnored</span><span>))</span><span># Range (in UTC offset) within which tweets will be deleted # ================================================================= </span> <span>range_start</span> <span>=</span> <span>datetime</span><span>.</span><span>strptime</span><span>(</span><span>'</span><span>Sep 10 00:00:00 +0000 2012</span><span>'</span><span>,</span><span>'</span><span>%b %d %H:%M:%S %z %Y</span><span>'</span><span>)</span> <span>range_end</span> <span>=</span> <span>datetime</span><span>.</span><span>strptime</span><span>(</span><span>'</span><span>Sep 10 00:00:00 +0000 2017</span><span>'</span><span>,</span><span>'</span><span>%b %d %H:%M:%S %z %Y</span><span>'</span><span>)</span> <span># ================================================================== # I am creating a list of tweet IDs for consideration, where tweetsToBeDeleted will be # used for deleting tweet # ================================================================== </span> <span>tweetsToBeDeleted</span> <span>=</span> <span>[]</span> <span>tweetsToBeIgnored</span> <span>=</span> <span>[]</span> <span>for</span> <span>element</span> <span>in</span> <span>myData</span><span>[</span><span>"</span><span>data</span><span>"</span><span>]:</span> <span>tweet_post_time</span> <span>=</span> <span>datetime</span><span>.</span><span>strptime</span><span>(</span><span>element</span><span>[</span><span>"</span><span>tweet</span><span>"</span><span>][</span><span>"</span><span>created_at</span><span>"</span><span>],</span><span>'</span><span>%a %b %d %H:%M:%S %z %Y</span><span>'</span><span>)</span> <span>if </span><span>(</span><span>tweet_post_time</span><span>>=</span> <span>range_start</span> <span>and</span> <span>tweet_post_time</span><span><=</span> <span>range_end</span> <span>):</span> <span>tweetsToBeDeleted</span><span>.</span><span>append</span><span>(</span><span>element</span><span>[</span><span>"</span><span>tweet</span><span>"</span><span>][</span><span>"</span><span>id_str</span><span>"</span><span>])</span> <span>else</span><span>:</span> <span>tweetsToBeIgnored</span><span>.</span><span>append</span><span>(</span><span>element</span><span>[</span><span>"</span><span>tweet</span><span>"</span><span>][</span><span>"</span><span>id_str</span><span>"</span><span>])</span> <span>print</span><span>(</span><span>len</span><span>(</span><span>tweetsToBeDeleted</span><span>),</span><span>len</span><span>(</span><span>tweetsToBeIgnored</span><span>))</span># Range (in UTC offset) within which tweets will be deleted # ================================================================= range_start = datetime.strptime('Sep 10 00:00:00 +0000 2012','%b %d %H:%M:%S %z %Y') range_end = datetime.strptime('Sep 10 00:00:00 +0000 2017','%b %d %H:%M:%S %z %Y') # ================================================================== # I am creating a list of tweet IDs for consideration, where tweetsToBeDeleted will be # used for deleting tweet # ================================================================== tweetsToBeDeleted = [] tweetsToBeIgnored = [] for element in myData["data"]: tweet_post_time = datetime.strptime(element["tweet"]["created_at"],'%a %b %d %H:%M:%S %z %Y') if (tweet_post_time>= range_start and tweet_post_time<= range_end ): tweetsToBeDeleted.append(element["tweet"]["id_str"]) else: tweetsToBeIgnored.append(element["tweet"]["id_str"]) print(len(tweetsToBeDeleted),len(tweetsToBeIgnored))
Enter fullscreen mode Exit fullscreen mode
5) Finally you can iterate over the array and pass each id from “tweetsToBeDeleted” array into delete function for the tweets to be removed.
<span>for</span> <span>id</span> <span>in</span> <span>tweetsToBeDeleted</span><span>:</span><span>deleteTweet</span><span>(</span><span>id</span><span>)</span><span>for</span> <span>id</span> <span>in</span> <span>tweetsToBeDeleted</span><span>:</span> <span>deleteTweet</span><span>(</span><span>id</span><span>)</span>for id in tweetsToBeDeleted: deleteTweet(id)
Enter fullscreen mode Exit fullscreen mode
原文链接:Deleting old tweets using Python & Twitter API for a date range
暂无评论内容