Yet another AI script, is it useful?
If you’re like me you probably have, or have had, an RSS feed reader to at least try and keep up with news and blogs on the latest in tech among others. This project started as a way for me to get a bit more comprehensive summary and have them sent to my Slack chat as they happened. After trying several on-device text-to-speech(TTS) engines I was frustrated, they all had either incomplete sentences or missing punctuation or both, not at all usable. Since OpenAI released CustomGPT and Assistant AI API I decided to try that. The prompt for this is simple “Please summarize the tech articles to give a complete, and brief, summary”. That’s it. Here’s the script I put together and have been tweaking over the last couple of weeks, broken down into chunks.
Main Script Chunks
Imports
import json
import openai
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
import feedparser # type: ignore from newspaper import Article # type: ignore import sqlite3
from datetime import datetime
import time
from typing import Dict, Any, NamedTuple, cast
Enter fullscreen mode Exit fullscreen mode
Config load function
def load_config() -> Dict[str, Any]:
with open('config.json', 'r') as file:
return json.load(file)
Enter fullscreen mode Exit fullscreen mode
Sqlite3 database creation
You could use any sql database, for simplicity sake we are using sqlite3 here.
def create_database() -> None:
conn = sqlite3.connect('articles.db')
c = conn.cursor()
c.execute('''CREATE TABLE IF NOT EXISTS articles (link TEXT PRIMARY KEY, title TEXT, summary TEXT)''')
conn.commit()
conn.close()
Enter fullscreen mode Exit fullscreen mode
Checking to see if the article has already been summarized
This beehaves similar to caching, in thatt we don’t want to pay to summarize the same article over and over, so we check the URL to see if it’s already been summarized.
def is_article_summarized(link: str) -> bool:
conn = sqlite3.connect('articles.db')
c = conn.cursor()
c.execute("SELECT * FROM articles WHERE link = ?", (link,))
result = c.fetchone()
conn.close()
return result
Enter fullscreen mode Exit fullscreen mode
Save summary, link, and title to db
def is_article_summarized(link: str) -> bool:
conn = sqlite3.connect('articles.db')
c = conn.cursor()
c.execute("SELECT * FROM articles WHERE link = ?", (link,))
result = c.fetchone()
conn.close()
return result
Enter fullscreen mode Exit fullscreen mode
Create an OpenAI Thread for each summary
Following along with OpenAI’s documentation, we are creating a new thread for each article.
def create_thread(ass_id: str, prompt: str) -> tuple[str, str]:
thread = openai.beta.threads.create()
my_thread_id = thread.id
openai.beta.threads.messages.create(
thread_id=my_thread_id,
role="user",
content=prompt
)
run = openai.beta.threads.runs.create(
thread_id=my_thread_id,
assistant_id=ass_id,
)
return run.id, my_thread_id
Enter fullscreen mode Exit fullscreen mode
Check thread status
Here we are periodically checking in to see if the summary is finished, using a 2 second delay to avoid spamming our assistant.
def check_status(run_id: str, thread_id: str) -> str:
run = openai.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run_id,
)
return run.status
Enter fullscreen mode Exit fullscreen mode
Send to Slack
def send_message_to_slack(title: str, link: str, summary: str) -> None:
try:
message = f"New Article: *<{link}|{title}>*\nSummary: {summary}"
client.chat_postMessage(channel='#news', text=message)
except SlackApiError as e:
print(f"Error sending message: {e.response['error']}")
Enter fullscreen mode Exit fullscreen mode
Heavy lifting function to coordinate most of the rest
This is really where most of the processing takes place, rather then the main function, as most of the processing is dependent on previous steps. This may change in the future.
def fetch_articles_from_rss(rss_url: str) -> None:
feed = feedparser.parse(rss_url)
for entry in feed.entries:
if not is_article_summarized(entry.link):
article = Article(entry.link)
article.download()
article.parse()
# Truncate the article text if it exceeds the limit max_length = 32768 - len(entry.title) - len("Please summarize this article:\n\nTitle: \n\n")
article_text = article.text[:max_length] if len(article.text) > max_length else article.text
prompt = f"Please summarize this article:\n\nTitle: {entry.title}\n\n{article_text}"
run_id, thread_id = create_thread(assistant_id, prompt)
status = check_status(run_id, thread_id)
while status != "completed":
status = check_status(run_id, thread_id)
time.sleep(2)
response = openai.beta.threads.messages.list(thread_id=thread_id)
if response.data:
content = cast(Any, response.data[0].content[0])
summary = content.text.value
# summary = response.data[0].content[0].text.value # Send the article details to Slack send_message_to_slack(entry.title, entry.link, summary)
save_summary(entry.link, entry.title, summary)
time.sleep(20)
Enter fullscreen mode Exit fullscreen mode
Main Function
Realy all we wanted to happen here is setup/check for a database and setup the initial loop with some debugging print statements, which will be changed to logging in the future.
def main() -> None:
create_database()
while True:
now = datetime.now()
print(f'Punch in at {now}')
for rss_url in config['rss_urls']:
fetch_articles_from_rss(rss_url)
now = datetime.now()
print(f'Punch out at {now}')
time.sleep(900)
Enter fullscreen mode Exit fullscreen mode
Script body launcher
if __name__ == "__main__":
config = load_config()
# Set the API keys from the configuration openai.api_key = config['openai_key']
assistant_id = config['assistant_id']
client = WebClient(token=config['slack_token'])
main()
Enter fullscreen mode Exit fullscreen mode
config.json
You’ll need 3 things for the config to make this work, the assistant ID (found on the assistant page), the OpenAI API Key, and a Slack bot/app Token.
{ "openai_key": "sk-open-ai-key-here", "slack_token": "xoxb-slack-app-token", "assistant_id": "asst_assistatn-id", "rss_urls": [ "https://www.bleepingcomputer.com/feed/", "https://feeds.arstechnica.com/arstechnica/index", "https://www.wired.com/feed/tag/ai/latest/rss", "https://www.wired.com/feed/category/ideas/latest/rss", "https://www.wired.com/feed/category/science/latest/rss", "https://www.wired.com/feed/category/security/latest/rss", "https://www.wired.com/feed/category/backchannel/latest/rss", "https://www.wired.com/feed/tag/wired-guide/latest/rss", "https://www.cisa.gov/news.xml", "https://www.cisa.gov/cisa/blog.xml", "https://www.cisa.gov/cybersecurity-advisories/all.xml", "https://googleonlinesecurity.blogspot.com/atom.xml" ] }
Enter fullscreen mode Exit fullscreen mode
Trying it out
If you’d like to try this out follow the commands below(Linux and Mac), be sure to edit the config.json file.
git clone https://github.com/Blacknight318/openai_rss_summarizer
cd openai_rss_summarizer
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements
cp sample_config.json config.json
nano config.json # Edit and press ztrl+x to save
nohup python main.py&
Enter fullscreen mode Exit fullscreen mode
Todo
- Create link transformer for things link Cloudflares blog
- Create Streamlit webui for recall and search of old articles(separate file)
- Cdd functionality to search with @botname command
- Independent backend db scheme
- Python file to create Openai assistant from scratch
Closing the loop
This is still an ongoing project, if you’d like to keep up with the latest check out the Github repo. Till next time fair winds and following seas.
暂无评论内容