Transforming PDFs into Audio

In this guide, I will walk you through the process of converting PDF content into real-time audio playback using a combination of Python libraries. This approach is particularly useful for those who prefer to consume information audibly or for accessibility purposes. The code leverages text-to-speech technology and handles user interruptions gracefully.

Part 1 – Importing the Necessary Libraries

To begin, we need to import several Python libraries that will assist in loading PDFs, processing text, generating audio, and managing user interactions.

<span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span>
<span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span>
<span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span>
<span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span>
<span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span>
<span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span>
<span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span>
<span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span>
<span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span>
<span>import</span> <span>signal</span>
<span>import</span> <span>sys</span>
<span>import</span> <span>threading</span>
<span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span>
<span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span>
<span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span>
<span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span>
<span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span>
<span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span>
<span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span>
<span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span>
<span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span>
<span>import</span> <span>signal</span>
<span>import</span> <span>sys</span>
<span>import</span> <span>threading</span>
from gtts import gTTS from io import BytesIO from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.chat_models import ChatOllama from langchain_community.document_loaders import PyPDFLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from pydub import AudioSegment from pydub.playback import play import signal import sys import threading

Enter fullscreen mode Exit fullscreen mode

Overview of the Libraries:

  • gTTS: Google Text-to-Speech for converting text to audio.
  • BytesIO: In-memory binary stream for handling audio data.
  • LangChain: Tools for splitting text and processing it using language models.
  • PyPDFLoader: Specialized loader for extracting text from PDFs.
  • pydub: For audio manipulation and playback.
  • signal and threading: To manage user interruptions during playback.

Part 2: Handling User Interruption

We will allow the user to interrupt the audio playback gracefully. To achieve this, we set up a signal handler that listens for a Ctrl+C command.

<span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span>
<span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span>
<span>global</span> <span>stop_playback</span>
<span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span>
<span>stop_playback</span> <span>=</span> <span>True</span>
<span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span>
<span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span>

<span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span>
    <span>global</span> <span>stop_playback</span>
    <span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span>
    <span>stop_playback</span> <span>=</span> <span>True</span>

<span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span>
# Flag to control the loop stop_playback = False def signal_handler(sig, frame): global stop_playback print("\nGracefully stopping playback...") stop_playback = True # Assign the signal handler to SIGINT (Ctrl+C) signal.signal(signal.SIGINT, signal_handler)

Enter fullscreen mode Exit fullscreen mode

In addition to handling Ctrl+C, we also create a separate thread that listens for the Enter key press, providing an alternative way to stop playback.

<span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span>
<span>global</span> <span>stop_playback</span>
<span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span>
<span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span>
<span>stop_playback</span> <span>=</span> <span>True</span>
<span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span>
<span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span>
<span>listener_thread</span><span>.</span><span>start</span><span>()</span>
<span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span>
    <span>global</span> <span>stop_playback</span>
    <span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span>
    <span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span>
    <span>stop_playback</span> <span>=</span> <span>True</span>

<span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span>
<span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span>
<span>listener_thread</span><span>.</span><span>start</span><span>()</span>
# Function to listen for an Enter key press in a separate thread def listen_for_stop(): global stop_playback input("Press Enter to stop playback...\n") print("\nStopping playback...") stop_playback = True # Start the listener thread listener_thread = threading.Thread(target=listen_for_stop) listener_thread.daemon = True listener_thread.start()

Enter fullscreen mode Exit fullscreen mode

Part 3: Loading and Splitting the PDF Document

Next, we load the PDF document using PyPDFLoader and split it into manageable chunks using RecursiveCharacterTextSplitter.

<span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/path/to/your/document.pdf</span><span>"</span><span>)</span>
<span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span>
<span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span>
<span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span>
<span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/path/to/your/document.pdf</span><span>"</span><span>)</span>
<span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span>

<span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span>
<span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span>
# Load and split the PDF document loader = PyPDFLoader("/path/to/your/document.pdf") pages = loader.load_and_split() # Split text into chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=100) all_splits = text_splitter.split_documents(pages)

Enter fullscreen mode Exit fullscreen mode

This approach allows us to process the document piece by piece, making it easier to generate and play audio incrementally.

Part 4: Generating Text Summaries

We use the ChatOllama model to generate summaries of the text chunks. The model is initialized with specific parameters, and a prompt template is created to guide the model’s responses.

<span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span>
<span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span>
<span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span>
<span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span>

<span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span>

<span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span>
# Initialize the ChatOllama model llm = ChatOllama(model="llama3:instruct", temperature=0.6) # Create a prompt template prompt = ChatPromptTemplate.from_template("Summarize the findings of: {page_content}") # Define the chain chain = prompt | llm | StrOutputParser()

Enter fullscreen mode Exit fullscreen mode

Text Generation Function

We define a function to generate text summaries in chunks, which will be used later for audio playback.

<span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span>
<span>try</span><span>:</span>
<span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span>
<span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span>
<span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span>
<span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
<span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span>
    <span>try</span><span>:</span>
        <span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span>
        <span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span>
        <span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span>
            <span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span>
    <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
        <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
def generate_text_chunks(page_content): try: text = chain.invoke({"page_content": page_content}) sentences = text.split('. ') for sentence in sentences: yield sentence + '.' except Exception as e: print(f"Error generating text: {e}")

Enter fullscreen mode Exit fullscreen mode

Part 5: Converting Text to Speech and Playing Audio

Once we have the text chunks, the next step is converting these chunks into speech and playing them.

<span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span>
<span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span>
<span>return</span>
<span>try</span><span>:</span>
<span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span>
<span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span>
<span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span>
<span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span>
<span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span>
<span>play</span><span>(</span><span>audio_segment</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
<span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span>
    <span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span>
        <span>return</span>
    <span>try</span><span>:</span>
        <span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span>
        <span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span>
            <span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span>
            <span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span>
            <span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span>
            <span>play</span><span>(</span><span>audio_segment</span><span>)</span>
    <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
        <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
# Function to play audio from a text chunk def play_audio_chunk(text_chunk): if not text_chunk.strip(): return try: tts = gTTS(text=text_chunk, lang='en') with BytesIO() as audio_fp: tts.write_to_fp(audio_fp) audio_fp.seek(0) audio_segment = AudioSegment.from_file(audio_fp, format="mp3") play(audio_segment) except Exception as e: print(f"Error generating or playing audio: {e}")

Enter fullscreen mode Exit fullscreen mode

This function uses Google Text-to-Speech (gTTS) to generate audio from text and pydub to play the audio in real-time.

Part 6: Real-Time Text Generation and Audio Playback

Finally, we combine everything into a single function that handles real-time text generation and audio playback. This function will also respect user interruptions.

<span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span>
<span>global</span> <span>stop_playback</span>
<span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span>
<span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span>
<span>if</span> <span>stop_playback</span><span>:</span>
<span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span>
<span>return</span>
<span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span> <span># Visual feedback </span> <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span>
<span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span>
<span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span>
    <span>global</span> <span>stop_playback</span>
    <span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span>
        <span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span>
            <span>if</span> <span>stop_playback</span><span>:</span>
                <span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span>
                <span>return</span>
            <span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span>  <span># Visual feedback </span>            <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span>
    <span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span>
# Function to generate and play text in real-time def generate_and_play(): global stop_playback for split in all_splits: for chunk in generate_text_chunks(split.page_content): if stop_playback: print("Playback stopped by user.") return print(".", end="", flush=True) # Visual feedback play_audio_chunk(chunk) print("\nPlayback finished.")

Enter fullscreen mode Exit fullscreen mode

Starting the Process

To start the generation and playback process, simply call the generate_and_play() function.

<span>generate_and_play</span><span>()</span>
<span>generate_and_play</span><span>()</span>
generate_and_play()

Enter fullscreen mode Exit fullscreen mode

Conclusion

With this approach, you can convert lengthy PDF documents into summarized audio files that are played back in real-time. This method is particularly useful for those who prefer auditory learning or need accessible formats for consuming information. The integration of text-to-speech with user-interruption handling makes this solution robust and user-friendly.

By following the steps outlined in this guide, you can develop a custom tool that turns text into audio, providing an alternative way to engage with content.

Final Code:

<span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span>
<span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span>
<span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span>
<span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span>
<span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span>
<span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span>
<span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span>
<span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span>
<span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span>
<span>import</span> <span>signal</span>
<span>import</span> <span>sys</span>
<span>import</span> <span>threading</span>
<span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span>
<span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span>
<span>global</span> <span>stop_playback</span>
<span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span>
<span>stop_playback</span> <span>=</span> <span>True</span>
<span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span>
<span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span>
<span>global</span> <span>stop_playback</span>
<span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span>
<span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span>
<span>stop_playback</span> <span>=</span> <span>True</span>
<span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span>
<span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span>
<span>listener_thread</span><span>.</span><span>start</span><span>()</span>
<span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/home/roomal/Downloads/Stephen Mulhall/1 - Mulhall, Stephen - Heidegger and Being and Time - Scepticism, Cognition And Agency.pdf</span><span>"</span><span>)</span>
<span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span>
<span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span>
<span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span>
<span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span>
<span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span>
<span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span>
<span># Function to generate text in chunks </span><span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span>
<span>try</span><span>:</span>
<span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span>
<span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span>
<span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span>
<span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
<span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span>
<span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span>
<span>return</span>
<span>try</span><span>:</span>
<span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span>
<span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span>
<span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span>
<span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span>
<span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span>
<span>play</span><span>(</span><span>audio_segment</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
<span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span>
<span>global</span> <span>stop_playback</span>
<span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span>
<span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span>
<span>if</span> <span>stop_playback</span><span>:</span>
<span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span>
<span>return</span>
<span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span> <span># Visual feedback </span> <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span>
<span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span>
<span># Start the generation and playback process </span><span>generate_and_play</span><span>()</span>
<span>from</span> <span>gtts</span> <span>import</span> <span>gTTS</span>
<span>from</span> <span>io</span> <span>import</span> <span>BytesIO</span>
<span>from</span> <span>langchain.text_splitter</span> <span>import</span> <span>RecursiveCharacterTextSplitter</span>
<span>from</span> <span>langchain_community.chat_models</span> <span>import</span> <span>ChatOllama</span>
<span>from</span> <span>langchain_community.document_loaders</span> <span>import</span> <span>PyPDFLoader</span>
<span>from</span> <span>langchain_core.output_parsers</span> <span>import</span> <span>StrOutputParser</span>
<span>from</span> <span>langchain_core.prompts</span> <span>import</span> <span>ChatPromptTemplate</span>
<span>from</span> <span>pydub</span> <span>import</span> <span>AudioSegment</span>
<span>from</span> <span>pydub.playback</span> <span>import</span> <span>play</span>
<span>import</span> <span>signal</span>
<span>import</span> <span>sys</span>
<span>import</span> <span>threading</span>

<span># Flag to control the loop </span><span>stop_playback</span> <span>=</span> <span>False</span>

<span>def</span> <span>signal_handler</span><span>(</span><span>sig</span><span>,</span> <span>frame</span><span>):</span>
    <span>global</span> <span>stop_playback</span>
    <span>print</span><span>(</span><span>"</span><span>\n</span><span>Gracefully stopping playback...</span><span>"</span><span>)</span>
    <span>stop_playback</span> <span>=</span> <span>True</span>

<span># Assign the signal handler to SIGINT (Ctrl+C) </span><span>signal</span><span>.</span><span>signal</span><span>(</span><span>signal</span><span>.</span><span>SIGINT</span><span>,</span> <span>signal_handler</span><span>)</span>

<span># Function to listen for an Enter key press in a separate thread </span><span>def</span> <span>listen_for_stop</span><span>():</span>
    <span>global</span> <span>stop_playback</span>
    <span>input</span><span>(</span><span>"</span><span>Press Enter to stop playback...</span><span>\n</span><span>"</span><span>)</span>
    <span>print</span><span>(</span><span>"</span><span>\n</span><span>Stopping playback...</span><span>"</span><span>)</span>
    <span>stop_playback</span> <span>=</span> <span>True</span>

<span># Start the listener thread </span><span>listener_thread</span> <span>=</span> <span>threading</span><span>.</span><span>Thread</span><span>(</span><span>target</span><span>=</span><span>listen_for_stop</span><span>)</span>
<span>listener_thread</span><span>.</span><span>daemon</span> <span>=</span> <span>True</span>
<span>listener_thread</span><span>.</span><span>start</span><span>()</span>

<span># Load and split the PDF document </span><span>loader</span> <span>=</span> <span>PyPDFLoader</span><span>(</span><span>"</span><span>/home/roomal/Downloads/Stephen Mulhall/1 - Mulhall, Stephen - Heidegger and Being and Time - Scepticism, Cognition And Agency.pdf</span><span>"</span><span>)</span>
<span>pages</span> <span>=</span> <span>loader</span><span>.</span><span>load_and_split</span><span>()</span>

<span># Split text into chunks </span><span>text_splitter</span> <span>=</span> <span>RecursiveCharacterTextSplitter</span><span>(</span><span>chunk_size</span><span>=</span><span>2048</span><span>,</span> <span>chunk_overlap</span><span>=</span><span>100</span><span>)</span>
<span>all_splits</span> <span>=</span> <span>text_splitter</span><span>.</span><span>split_documents</span><span>(</span><span>pages</span><span>)</span>

<span># Initialize the ChatOllama model </span><span>llm</span> <span>=</span> <span>ChatOllama</span><span>(</span><span>model</span><span>=</span><span>"</span><span>llama3:instruct</span><span>"</span><span>,</span> <span>temperature</span><span>=</span><span>0.6</span><span>)</span>

<span># Create a prompt template </span><span>prompt</span> <span>=</span> <span>ChatPromptTemplate</span><span>.</span><span>from_template</span><span>(</span><span>"</span><span>Summarize the findings of: {page_content}</span><span>"</span><span>)</span>

<span># Define the chain </span><span>chain</span> <span>=</span> <span>prompt</span> <span>|</span> <span>llm</span> <span>|</span> <span>StrOutputParser</span><span>()</span>

<span># Function to generate text in chunks </span><span>def</span> <span>generate_text_chunks</span><span>(</span><span>page_content</span><span>):</span>
    <span>try</span><span>:</span>
        <span>text</span> <span>=</span> <span>chain</span><span>.</span><span>invoke</span><span>({</span><span>"</span><span>page_content</span><span>"</span><span>:</span> <span>page_content</span><span>})</span>
        <span>sentences</span> <span>=</span> <span>text</span><span>.</span><span>split</span><span>(</span><span>'</span><span>. </span><span>'</span><span>)</span>
        <span>for</span> <span>sentence</span> <span>in</span> <span>sentences</span><span>:</span>
            <span>yield</span> <span>sentence</span> <span>+</span> <span>'</span><span>.</span><span>'</span>
    <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
        <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating text: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>

<span># Function to play audio from a text chunk </span><span>def</span> <span>play_audio_chunk</span><span>(</span><span>text_chunk</span><span>):</span>
    <span>if</span> <span>not</span> <span>text_chunk</span><span>.</span><span>strip</span><span>():</span>
        <span>return</span>
    <span>try</span><span>:</span>
        <span>tts</span> <span>=</span> <span>gTTS</span><span>(</span><span>text</span><span>=</span><span>text_chunk</span><span>,</span> <span>lang</span><span>=</span><span>'</span><span>en</span><span>'</span><span>)</span>
        <span>with</span> <span>BytesIO</span><span>()</span> <span>as</span> <span>audio_fp</span><span>:</span>
            <span>tts</span><span>.</span><span>write_to_fp</span><span>(</span><span>audio_fp</span><span>)</span>
            <span>audio_fp</span><span>.</span><span>seek</span><span>(</span><span>0</span><span>)</span>
            <span>audio_segment</span> <span>=</span> <span>AudioSegment</span><span>.</span><span>from_file</span><span>(</span><span>audio_fp</span><span>,</span> <span>format</span><span>=</span><span>"</span><span>mp3</span><span>"</span><span>)</span>
            <span>play</span><span>(</span><span>audio_segment</span><span>)</span>
    <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
        <span>print</span><span>(</span><span>f</span><span>"</span><span>Error generating or playing audio: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>

<span># Function to generate and play text in real-time </span><span>def</span> <span>generate_and_play</span><span>():</span>
    <span>global</span> <span>stop_playback</span>
    <span>for</span> <span>split</span> <span>in</span> <span>all_splits</span><span>:</span>
        <span>for</span> <span>chunk</span> <span>in</span> <span>generate_text_chunks</span><span>(</span><span>split</span><span>.</span><span>page_content</span><span>):</span>
            <span>if</span> <span>stop_playback</span><span>:</span>
                <span>print</span><span>(</span><span>"</span><span>Playback stopped by user.</span><span>"</span><span>)</span>
                <span>return</span>
            <span>print</span><span>(</span><span>"</span><span>.</span><span>"</span><span>,</span> <span>end</span><span>=</span><span>""</span><span>,</span> <span>flush</span><span>=</span><span>True</span><span>)</span>  <span># Visual feedback </span>            <span>play_audio_chunk</span><span>(</span><span>chunk</span><span>)</span>
    <span>print</span><span>(</span><span>"</span><span>\n</span><span>Playback finished.</span><span>"</span><span>)</span>

<span># Start the generation and playback process </span><span>generate_and_play</span><span>()</span>
from gtts import gTTS from io import BytesIO from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.chat_models import ChatOllama from langchain_community.document_loaders import PyPDFLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from pydub import AudioSegment from pydub.playback import play import signal import sys import threading # Flag to control the loop stop_playback = False def signal_handler(sig, frame): global stop_playback print("\nGracefully stopping playback...") stop_playback = True # Assign the signal handler to SIGINT (Ctrl+C) signal.signal(signal.SIGINT, signal_handler) # Function to listen for an Enter key press in a separate thread def listen_for_stop(): global stop_playback input("Press Enter to stop playback...\n") print("\nStopping playback...") stop_playback = True # Start the listener thread listener_thread = threading.Thread(target=listen_for_stop) listener_thread.daemon = True listener_thread.start() # Load and split the PDF document loader = PyPDFLoader("/home/roomal/Downloads/Stephen Mulhall/1 - Mulhall, Stephen - Heidegger and Being and Time - Scepticism, Cognition And Agency.pdf") pages = loader.load_and_split() # Split text into chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=100) all_splits = text_splitter.split_documents(pages) # Initialize the ChatOllama model llm = ChatOllama(model="llama3:instruct", temperature=0.6) # Create a prompt template prompt = ChatPromptTemplate.from_template("Summarize the findings of: {page_content}") # Define the chain chain = prompt | llm | StrOutputParser() # Function to generate text in chunks def generate_text_chunks(page_content): try: text = chain.invoke({"page_content": page_content}) sentences = text.split('. ') for sentence in sentences: yield sentence + '.' except Exception as e: print(f"Error generating text: {e}") # Function to play audio from a text chunk def play_audio_chunk(text_chunk): if not text_chunk.strip(): return try: tts = gTTS(text=text_chunk, lang='en') with BytesIO() as audio_fp: tts.write_to_fp(audio_fp) audio_fp.seek(0) audio_segment = AudioSegment.from_file(audio_fp, format="mp3") play(audio_segment) except Exception as e: print(f"Error generating or playing audio: {e}") # Function to generate and play text in real-time def generate_and_play(): global stop_playback for split in all_splits: for chunk in generate_text_chunks(split.page_content): if stop_playback: print("Playback stopped by user.") return print(".", end="", flush=True) # Visual feedback play_audio_chunk(chunk) print("\nPlayback finished.") # Start the generation and playback process generate_and_play()

Enter fullscreen mode Exit fullscreen mode

Until next time.

Best,

Roomal

原文链接:Transforming PDFs into Audio

© 版权声明
THE END
喜欢就支持一下吧
点赞12 分享
The future you will certainly thank yourself now desperately.
未来的你一定会感谢现在拼命的自己
评论 抢沙发

请登录后发表评论

    暂无评论内容