In this guide, I will walk you through the process of converting PDF content into real-time audio playback using a combination of Python libraries. This approach is particularly useful for those who prefer to consume information audibly or for accessibility purposes. The code leverages text-to-speech technology and handles user interruptions gracefully.
Part 1 – Importing the Necessary Libraries
To begin, we need to import several Python libraries that will assist in loading PDFs, processing text, generating audio, and managing user interactions.
<span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span><span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span><span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span><span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span><span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span><span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span><span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span><span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span><span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span><span>import</span> <span>signal</span><span>import</span> <span>sys</span><span>import</span> <span>threading</span><span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span> <span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span> <span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span> <span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span> <span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span> <span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span> <span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span> <span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span> <span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span> <span>import</span> <span>signal</span> <span>import</span> <span>sys</span> <span>import</span> <span>threading</span>from gtts import gTTS from io import BytesIO from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.chat_models import ChatOllama from langchain_community.document_loaders import PyPDFLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from pydub import AudioSegment from pydub.playback import play import signal import sys import threading
Enter fullscreen mode Exit fullscreen mode
Overview of the Libraries:
- gTTS: Google Text-to-Speech for converting text to audio.
- BytesIO: In-memory binary stream for handling audio data.
- LangChain: Tools for splitting text and processing it using language models.
- PyPDFLoader: Specialized loader for extracting text from PDFs.
- pydub: For audio manipulation and playback.
- signal and threading: To manage user interruptions during playback.
Part 2: Handling User Interruption
We will allow the user to interrupt the audio playback gracefully. To achieve this, we set up a signal handler that listens for a Ctrl+C
command.
<span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span><span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span><span>global</span> <span>stop_playback</span><span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span><span>stop_playback</span> <span>=</span> <span>True</span><span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span><span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span> <span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span> <span>global</span> <span>stop_playback</span> <span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span> <span>stop_playback</span> <span>=</span> <span>True</span> <span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span># Flag to control the loop stop_playback = False def signal_handler(sig, frame): global stop_playback print("\nGracefully stopping playback...") stop_playback = True # Assign the signal handler to SIGINT (Ctrl+C) signal.signal(signal.SIGINT, signal_handler)
Enter fullscreen mode Exit fullscreen mode
In addition to handling Ctrl+C
, we also create a separate thread that listens for the Enter key press, providing an alternative way to stop playback.
<span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span><span>global</span> <span>stop_playback</span><span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span><span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span><span>stop_playback</span> <span>=</span> <span>True</span><span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span><span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span><span>listener_thread</span><span>.</span><span>start</span><span>()</span><span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span> <span>global</span> <span>stop_playback</span> <span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span> <span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span> <span>stop_playback</span> <span>=</span> <span>True</span> <span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span> <span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span> <span>listener_thread</span><span>.</span><span>start</span><span>()</span># Function to listen for an Enter key press in a separate thread def listen_for_stop(): global stop_playback input("Press Enter to stop playback...\n") print("\nStopping playback...") stop_playback = True # Start the listener thread listener_thread = threading.Thread(target=listen_for_stop) listener_thread.daemon = True listener_thread.start()
Enter fullscreen mode Exit fullscreen mode
Part 3: Loading and Splitting the PDF Document
Next, we load the PDF document using PyPDFLoader
and split it into manageable chunks using RecursiveCharacterTextSplitter
.
<span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/path/to/your/document.pdf</span><span>"</span><span>)</span><span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span><span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span><span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span><span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/path/to/your/document.pdf</span><span>"</span><span>)</span> <span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span> <span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span> <span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span># Load and split the PDF document loader = PyPDFLoader("/path/to/your/document.pdf") pages = loader.load_and_split() # Split text into chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=100) all_splits = text_splitter.split_documents(pages)
Enter fullscreen mode Exit fullscreen mode
This approach allows us to process the document piece by piece, making it easier to generate and play audio incrementally.
Part 4: Generating Text Summaries
We use the ChatOllama
model to generate summaries of the text chunks. The model is initialized with specific parameters, and a prompt template is created to guide the model’s responses.
<span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span><span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span><span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span><span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span> <span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span> <span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span># Initialize the ChatOllama model llm = ChatOllama(model="llama3:instruct", temperature=0.6) # Create a prompt template prompt = ChatPromptTemplate.from_template("Summarize the findings of: {page_content}") # Define the chain chain = prompt | llm | StrOutputParser()
Enter fullscreen mode Exit fullscreen mode
Text Generation Function
We define a function to generate text summaries in chunks, which will be used later for audio playback.
<span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span><span>try</span><span>:</span><span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span><span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span><span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span><span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span><span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span><span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span><span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span> <span>try</span><span>:</span> <span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span> <span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span> <span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span> <span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span> <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span> <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>def generate_text_chunks(page_content): try: text = chain.invoke({"page_content": page_content}) sentences = text.split('. ') for sentence in sentences: yield sentence + '.' except Exception as e: print(f"Error generating text: {e}")
Enter fullscreen mode Exit fullscreen mode
Part 5: Converting Text to Speech and Playing Audio
Once we have the text chunks, the next step is converting these chunks into speech and playing them.
<span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span><span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span><span>return</span><span>try</span><span>:</span><span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span><span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span><span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span><span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span><span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span><span>play</span><span>(</span><span>audio_segment</span><span>)</span><span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span><span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span><span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span> <span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span> <span>return</span> <span>try</span><span>:</span> <span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span> <span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span> <span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span> <span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span> <span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span> <span>play</span><span>(</span><span>audio_segment</span><span>)</span> <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span> <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span># Function to play audio from a text chunk def play_audio_chunk(text_chunk): if not text_chunk.strip(): return try: tts = gTTS(text=text_chunk, lang='en') with BytesIO() as audio_fp: tts.write_to_fp(audio_fp) audio_fp.seek(0) audio_segment = AudioSegment.from_file(audio_fp, format="mp3") play(audio_segment) except Exception as e: print(f"Error generating or playing audio: {e}")
Enter fullscreen mode Exit fullscreen mode
This function uses Google Text-to-Speech (gTTS
) to generate audio from text and pydub
to play the audio in real-time.
Part 6: Real-Time Text Generation and Audio Playback
Finally, we combine everything into a single function that handles real-time text generation and audio playback. This function will also respect user interruptions.
<span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span><span>global</span> <span>stop_playback</span><span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span><span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span><span>if</span> <span>stop_playback</span><span>:</span><span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span><span>return</span><span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span> <span># Visual feedback </span> <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span><span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span><span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span> <span>global</span> <span>stop_playback</span> <span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span> <span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span> <span>if</span> <span>stop_playback</span><span>:</span> <span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span> <span>return</span> <span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span> <span># Visual feedback </span> <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span> <span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span># Function to generate and play text in real-time def generate_and_play(): global stop_playback for split in all_splits: for chunk in generate_text_chunks(split.page_content): if stop_playback: print("Playback stopped by user.") return print(".", end="", flush=True) # Visual feedback play_audio_chunk(chunk) print("\nPlayback finished.")
Enter fullscreen mode Exit fullscreen mode
Starting the Process
To start the generation and playback process, simply call the generate_and_play()
function.
<span>generate_and_play</span><span>()</span><span>generate_and_play</span><span>()</span>generate_and_play()
Enter fullscreen mode Exit fullscreen mode
Conclusion
With this approach, you can convert lengthy PDF documents into summarized audio files that are played back in real-time. This method is particularly useful for those who prefer auditory learning or need accessible formats for consuming information. The integration of text-to-speech with user-interruption handling makes this solution robust and user-friendly.
By following the steps outlined in this guide, you can develop a custom tool that turns text into audio, providing an alternative way to engage with content.
Final Code:
<span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span><span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span><span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span><span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span><span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span><span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span><span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span><span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span><span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span><span>import</span> <span>signal</span><span>import</span> <span>sys</span><span>import</span> <span>threading</span><span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span><span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span><span>global</span> <span>stop_playback</span><span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span><span>stop_playback</span> <span>=</span> <span>True</span><span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span><span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span><span>global</span> <span>stop_playback</span><span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span><span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span><span>stop_playback</span> <span>=</span> <span>True</span><span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span><span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span><span>listener_thread</span><span>.</span><span>start</span><span>()</span><span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/home/roomal/Downloads/Stephen Mulhall/1 - Mulhall, Stephen - Heidegger and Being and Time - Scepticism, Cognition And Agency.pdf</span><span>"</span><span>)</span><span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span><span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span><span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span><span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span><span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span><span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span><span># Function to generate text in chunks </span><span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span><span>try</span><span>:</span><span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span><span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span><span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span><span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span><span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span><span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span><span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span><span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span><span>return</span><span>try</span><span>:</span><span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span><span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span><span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span><span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span><span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span><span>play</span><span>(</span><span>audio_segment</span><span>)</span><span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span><span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span><span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span><span>global</span> <span>stop_playback</span><span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span><span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span><span>if</span> <span>stop_playback</span><span>:</span><span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span><span>return</span><span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span> <span># Visual feedback </span> <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span><span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span><span># Start the generation and playback process </span><span>generate_and_play</span><span>()</span><span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span> <span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span> <span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span> <span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span> <span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span> <span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span> <span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span> <span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span> <span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span> <span>import</span> <span>signal</span> <span>import</span> <span>sys</span> <span>import</span> <span>threading</span> <span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span> <span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span> <span>global</span> <span>stop_playback</span> <span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span> <span>stop_playback</span> <span>=</span> <span>True</span> <span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span> <span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span> <span>global</span> <span>stop_playback</span> <span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span> <span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span> <span>stop_playback</span> <span>=</span> <span>True</span> <span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span> <span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span> <span>listener_thread</span><span>.</span><span>start</span><span>()</span> <span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/home/roomal/Downloads/Stephen Mulhall/1 - Mulhall, Stephen - Heidegger and Being and Time - Scepticism, Cognition And Agency.pdf</span><span>"</span><span>)</span> <span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span> <span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span> <span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span> <span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span> <span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span> <span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span> <span># Function to generate text in chunks </span><span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span> <span>try</span><span>:</span> <span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span> <span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span> <span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span> <span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span> <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span> <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span> <span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span> <span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span> <span>return</span> <span>try</span><span>:</span> <span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span> <span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span> <span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span> <span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span> <span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span> <span>play</span><span>(</span><span>audio_segment</span><span>)</span> <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span> <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span> <span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span> <span>global</span> <span>stop_playback</span> <span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span> <span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span> <span>if</span> <span>stop_playback</span><span>:</span> <span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span> <span>return</span> <span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span> <span># Visual feedback </span> <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span> <span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span> <span># Start the generation and playback process </span><span>generate_and_play</span><span>()</span>from gtts import gTTS from io import BytesIO from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.chat_models import ChatOllama from langchain_community.document_loaders import PyPDFLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from pydub import AudioSegment from pydub.playback import play import signal import sys import threading # Flag to control the loop stop_playback = False def signal_handler(sig, frame): global stop_playback print("\nGracefully stopping playback...") stop_playback = True # Assign the signal handler to SIGINT (Ctrl+C) signal.signal(signal.SIGINT, signal_handler) # Function to listen for an Enter key press in a separate thread def listen_for_stop(): global stop_playback input("Press Enter to stop playback...\n") print("\nStopping playback...") stop_playback = True # Start the listener thread listener_thread = threading.Thread(target=listen_for_stop) listener_thread.daemon = True listener_thread.start() # Load and split the PDF document loader = PyPDFLoader("/home/roomal/Downloads/Stephen Mulhall/1 - Mulhall, Stephen - Heidegger and Being and Time - Scepticism, Cognition And Agency.pdf") pages = loader.load_and_split() # Split text into chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=100) all_splits = text_splitter.split_documents(pages) # Initialize the ChatOllama model llm = ChatOllama(model="llama3:instruct", temperature=0.6) # Create a prompt template prompt = ChatPromptTemplate.from_template("Summarize the findings of: {page_content}") # Define the chain chain = prompt | llm | StrOutputParser() # Function to generate text in chunks def generate_text_chunks(page_content): try: text = chain.invoke({"page_content": page_content}) sentences = text.split('. ') for sentence in sentences: yield sentence + '.' except Exception as e: print(f"Error generating text: {e}") # Function to play audio from a text chunk def play_audio_chunk(text_chunk): if not text_chunk.strip(): return try: tts = gTTS(text=text_chunk, lang='en') with BytesIO() as audio_fp: tts.write_to_fp(audio_fp) audio_fp.seek(0) audio_segment = AudioSegment.from_file(audio_fp, format="mp3") play(audio_segment) except Exception as e: print(f"Error generating or playing audio: {e}") # Function to generate and play text in real-time def generate_and_play(): global stop_playback for split in all_splits: for chunk in generate_text_chunks(split.page_content): if stop_playback: print("Playback stopped by user.") return print(".", end="", flush=True) # Visual feedback play_audio_chunk(chunk) print("\nPlayback finished.") # Start the generation and playback process generate_and_play()
Enter fullscreen mode Exit fullscreen mode
Until next time.
Best,
Roomal
暂无评论内容