Deploying Serverless Embedding App with AWS CDK, Lambda and Amazon Aurora PostgreSQL

From Notebook to Serverless: Creating a Multimodal Search Engine with Amazon Bedrock and PostgreSQL (3 Part Series)

1 From Notebook to Serverless: Creating a Multimodal Search Engine with Amazon Bedrock and PostgreSQL
2 Building a Multimodal Search Engine with Amazon Titan Embeddings, Aurora Serveless PostgreSQL and LangChain
3 Deploying Serverless Embedding App with AWS CDK, Lambda and Amazon Aurora PostgreSQL

Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Elizabeth Fuentes LFollow

AWS Developer Advocate

Welcome to Part 2 of our two-part blog series! In this post, I’ll elevate the concepts explored in Part 1 to create a scalable, production-ready solution. Using AWS Lambda functions and AWS CDK, you’ll transform our notebook-based prototype into a robust, serverless architecture. Together, we’ll develop AWS Lambda functions for embedding generation and retrieval, leverage AWS CDK for infrastructure-as-code deployment, and integrate with Amazon S3 and Amazon Aurora PostgreSQL for efficient data storage and retrieval. By the end of this tutorial, you’ll have a fully functional, serverless multimodal search engine capable of understanding and retrieving both textual and visual content.

AWS Level: Advanced – 300

Prerequisites:

Foundational knowledge of Python
AWS Account
Enable model Access for the following models:
Amazon Titan Embeddings V2
Anthropic Claude 3 models (Haiku or Sonnet).
Set up the AWS Command Line Interface (CLI)
Optional: Bootstrap your account/region if this is your first CDK Project
Read about AWS CDK “Get started with Python”

Cost to complete:

In the second part, you’ll construct a Serverless Embedding App utilizing the AWS Cloud Development Kit (CDK) to create four Lambda Functions.

Learn how test Lambda Functions in the console with test events.

AWS Lambda Functions for Generating Embeddings for Text and Image Files:

To handle the embedding process, there is a dedicated Lambda Function for each file type:

To generate embeddings for the text content of PDF files with FAISS.

Event to trigger:


{
    "location": "REPLACE-YOU-KEY",
    "vectorStoreLocation": "REPALCE-NAME.vdb",
    "bucketName": "REPLACE-YOU-BUCKET",
    "vectorStoreType": "faiss",
    "splitStrategy": "semantic",
    "fileType": "application/pdf", 
    "embeddingModel": "amazon.titan-embed-text-v1"
  }
{
    "location": "REPLACE-YOU-KEY",
    "vectorStoreLocation": "REPALCE-NAME.vdb",
    "bucketName": "REPLACE-YOU-BUCKET",
    "vectorStoreType": "faiss",
    "splitStrategy": "semantic",
    "fileType": "application/pdf", 
    "embeddingModel": "amazon.titan-embed-text-v1"
  }
{
    "location": "REPLACE-YOU-KEY",
    "vectorStoreLocation": "REPALCE-NAME.vdb",
    "bucketName": "REPLACE-YOU-BUCKET",
    "vectorStoreType": "faiss",
    "splitStrategy": "semantic",
    "fileType": "application/pdf", 
    "embeddingModel": "amazon.titan-embed-text-v1"
  }

Enter fullscreen mode Exit fullscreen mode

To generate embeddings for images with FAISS.

Event to trigger:


{
    "location": "REPLACE-YOU-KEY-FOLDER",
    "vectorStoreLocation": "REPLACE-NAME.vdb",
    "bucketName": "REPLACE-YOU-BUCKET",
    "vectorStoreType": "faiss",
    "splitStrategy": "semantic",
    "embeddingModel": "amazon.titan-embed-image-v1"
}
{
    "location": "REPLACE-YOU-KEY-FOLDER",
    "vectorStoreLocation": "REPLACE-NAME.vdb",
    "bucketName": "REPLACE-YOU-BUCKET",
    "vectorStoreType": "faiss",
    "splitStrategy": "semantic",
    "embeddingModel": "amazon.titan-embed-image-v1"
}
{
    "location": "REPLACE-YOU-KEY-FOLDER",
    "vectorStoreLocation": "REPLACE-NAME.vdb",
    "bucketName": "REPLACE-YOU-BUCKET",
    "vectorStoreType": "faiss",
    "splitStrategy": "semantic",
    "embeddingModel": "amazon.titan-embed-image-v1"
}

Enter fullscreen mode Exit fullscreen mode

To generate embeddings for image/pdf with pgvector and Amazon Aurora.

Before testing this Lambda Function keep in mind that it must be in the same VPC and be able to access the Amazon Aurora PostreSQL DB, for that check Automatically connecting a Lambda function and an Aurora DB cluster, Using Amazon RDS Proxy for Aurora and Use interface VPC endpoints (AWS PrivateLink) for Amazon Bedrock VPC endpoint.

Event to trigger:


{
  "location": "YOU-KEY",
  "bucketName": "YOU-BUCKET-NAME",
  "fileType": "pdf or image",
  "embeddingModel": "amazon.titan-embed-text-v1", 
  "PGVECTOR_USER":"YOU-RDS-USER",
  "PGVECTOR_PASSWORD":"YOU-RDS-PASSWORD",
  "PGVECTOR_HOST":"YOU-RDS-ENDPOINT-PROXY",
  "PGVECTOR_DATABASE":"YOU-RDS-DATABASE",
  "PGVECTOR_PORT":"5432",
  "collectioName": "YOU-collectioName",
  "bedrock_endpoint": "https://vpce-...-.....bedrock-runtime.YOU-REGION.vpce.amazonaws.com"
}
{
  "location": "YOU-KEY",
  "bucketName": "YOU-BUCKET-NAME",
  "fileType": "pdf or image",
  "embeddingModel": "amazon.titan-embed-text-v1", 
  "PGVECTOR_USER":"YOU-RDS-USER",
  "PGVECTOR_PASSWORD":"YOU-RDS-PASSWORD",
  "PGVECTOR_HOST":"YOU-RDS-ENDPOINT-PROXY",
  "PGVECTOR_DATABASE":"YOU-RDS-DATABASE",
  "PGVECTOR_PORT":"5432",
  "collectioName": "YOU-collectioName",
  "bedrock_endpoint": "https://vpce-...-.....bedrock-runtime.YOU-REGION.vpce.amazonaws.com"
}
{
  "location": "YOU-KEY",
  "bucketName": "YOU-BUCKET-NAME",
  "fileType": "pdf or image",
  "embeddingModel": "amazon.titan-embed-text-v1", 
  "PGVECTOR_USER":"YOU-RDS-USER",
  "PGVECTOR_PASSWORD":"YOU-RDS-PASSWORD",
  "PGVECTOR_HOST":"YOU-RDS-ENDPOINT-PROXY",
  "PGVECTOR_DATABASE":"YOU-RDS-DATABASE",
  "PGVECTOR_PORT":"5432",
  "collectioName": "YOU-collectioName",
  "bedrock_endpoint": "https://vpce-...-.....bedrock-runtime.YOU-REGION.vpce.amazonaws.com"
}

Enter fullscreen mode Exit fullscreen mode

AWS Lambda Funtions to Query for Text and Image Files in a Vector DB:

To handle the embedding process, there is a dedicated Lambda Function for each file type:

To retrieval text content from a vector DB

Event to trigger:


{
  "vectorStoreLocation": "REPLACE-NAME.vdb",
  "bucketName": "REPLACE-YOU-BUCKET",
  "vectorStoreType": "faiss",
  "query": "YOU-QUERY",
  "numDocs": 5,
  "embeddingModel": "amazon.titan-embed-text-v1"
}
{
  "vectorStoreLocation": "REPLACE-NAME.vdb",
  "bucketName": "REPLACE-YOU-BUCKET",
  "vectorStoreType": "faiss",
  "query": "YOU-QUERY",
  "numDocs": 5,
  "embeddingModel": "amazon.titan-embed-text-v1"
}
{
  "vectorStoreLocation": "REPLACE-NAME.vdb",
  "bucketName": "REPLACE-YOU-BUCKET",
  "vectorStoreType": "faiss",
  "query": "YOU-QUERY",
  "numDocs": 5,
  "embeddingModel": "amazon.titan-embed-text-v1"
}

Enter fullscreen mode Exit fullscreen mode

To retrieval image location from a vector DB

You can search by text or by image

Text event to trigger


{
  "vectorStoreLocation": "REPLACE-NAME.vdb",
  "bucketName": "REPLACE-YOU-BUCKET",
  "vectorStoreType": "faiss",
  "InputType": "text",
  "query":"TEXT-QUERY",
  "embeddingModel": "amazon.titan-embed-text-v1"
}
{
  "vectorStoreLocation": "REPLACE-NAME.vdb",
  "bucketName": "REPLACE-YOU-BUCKET",
  "vectorStoreType": "faiss",
  "InputType": "text",
  "query":"TEXT-QUERY",
  "embeddingModel": "amazon.titan-embed-text-v1"
}
{
  "vectorStoreLocation": "REPLACE-NAME.vdb",
  "bucketName": "REPLACE-YOU-BUCKET",
  "vectorStoreType": "faiss",
  "InputType": "text",
  "query":"TEXT-QUERY",
  "embeddingModel": "amazon.titan-embed-text-v1"
}

Enter fullscreen mode Exit fullscreen mode

Image event to trigger

The next step is to take the image_path value and download the file from Amazon S3 bucket with a download_file boto3 method.

To generate embeddings for image/pdf with pgvector and Amazon Aurora.


{
  "location": "YOU-KEY",
  "bucketName": "YOU-BUCKET-NAME",
  "fileType": "pdf or image",
  "embeddingModel": "amazon.titan-embed-text-v1", 
  "PGVECTOR_USER":"YOU-RDS-USER",
  "PGVECTOR_PASSWORD":"YOU-RDS-PASSWORD",
  "PGVECTOR_HOST":"YOU-RDS-ENDPOINT-PROXY",
  "PGVECTOR_DATABASE":"YOU-RDS-DATABASE",
  "PGVECTOR_PORT":"5432",
  "collectioName": "YOU-collectioName",
  "bedrock_endpoint": "https://vpce-...-.....bedrock-runtime.YOU-REGION.vpce.amazonaws.com",
  "QUERY": "YOU-TEXT-QUESTION"
  }
{
  "location": "YOU-KEY",
  "bucketName": "YOU-BUCKET-NAME",
  "fileType": "pdf or image",
  "embeddingModel": "amazon.titan-embed-text-v1", 
  "PGVECTOR_USER":"YOU-RDS-USER",
  "PGVECTOR_PASSWORD":"YOU-RDS-PASSWORD",
  "PGVECTOR_HOST":"YOU-RDS-ENDPOINT-PROXY",
  "PGVECTOR_DATABASE":"YOU-RDS-DATABASE",
  "PGVECTOR_PORT":"5432",
  "collectioName": "YOU-collectioName",
  "bedrock_endpoint": "https://vpce-...-.....bedrock-runtime.YOU-REGION.vpce.amazonaws.com",
  "QUERY": "YOU-TEXT-QUESTION"
  }
{
  "location": "YOU-KEY",
  "bucketName": "YOU-BUCKET-NAME",
  "fileType": "pdf or image",
  "embeddingModel": "amazon.titan-embed-text-v1", 
  "PGVECTOR_USER":"YOU-RDS-USER",
  "PGVECTOR_PASSWORD":"YOU-RDS-PASSWORD",
  "PGVECTOR_HOST":"YOU-RDS-ENDPOINT-PROXY",
  "PGVECTOR_DATABASE":"YOU-RDS-DATABASE",
  "PGVECTOR_PORT":"5432",
  "collectioName": "YOU-collectioName",
  "bedrock_endpoint": "https://vpce-...-.....bedrock-runtime.YOU-REGION.vpce.amazonaws.com",
  "QUERY": "YOU-TEXT-QUESTION"
  }

Enter fullscreen mode Exit fullscreen mode

Use location and bucketName to deliver image location to make a query.

Let’s build!

The Amazon Lambdas that you build in this deployment are created with a container images, you must have Docker Desktop installed and active in your computer.

Step 1: APP Set Up

Clone the repo


git clone https://github.com/build-on-aws/langchain-embeddings
git clone https://github.com/build-on-aws/langchain-embeddings
git clone https://github.com/build-on-aws/langchain-embeddings

Enter fullscreen mode Exit fullscreen mode

Go to:


cd serveless-embeddings
cd serveless-embeddings
cd serveless-embeddings

Enter fullscreen mode Exit fullscreen mode

Configure the AWS Command Line Interface
Deploy architecture with CDK Follow steps

Follow steps:

Step 2: Deploy architecture with CDK.

Create The Virtual Environment: by following the steps in the README:


python3 -m venv .venv
source .venv/bin/activate
python3 -m venv .venv
source .venv/bin/activate
python3 -m venv .venv
source .venv/bin/activate

Enter fullscreen mode Exit fullscreen mode

for windows:


.venv\Scripts\activate.bat
.venv\Scripts\activate.bat
.venv\Scripts\activate.bat

Enter fullscreen mode Exit fullscreen mode

Install The Requirements:


pip install -r requirements.txt
pip install -r requirements.txt
pip install -r requirements.txt

Enter fullscreen mode Exit fullscreen mode

Synthesize The Cloudformation Template With The Following Command:


cdk synth
cdk synth
cdk synth

Enter fullscreen mode Exit fullscreen mode

The Deployment:


cdk deploy
cdk deploy
cdk deploy

Enter fullscreen mode Exit fullscreen mode

🧹 Clean the house!:

If you finish testing and want to clean the application, you just have to follow these two steps:

Delete the files from the Amazon S3 bucket created in the deployment.
Run this command in your terminal:


cdk destroy
cdk destroy
cdk destroy

Enter fullscreen mode Exit fullscreen mode

Conclusion

In this post, you’ve demonstrated how to transform a notebook-based multimodal search solution into a scalable, serverless architecture using AWS services. You’ve walked through the process of developing Lambda functions for embedding tasks, utilizing AWS CDK for infrastructure deployment, and integrating with S3 and Aurora PostgreSQL for efficient data management.

By leveraging these serverless technologies, you can now deploy a robust, production-ready multimodal search engine capable of handling both textual and visual content. This approach not only enhances scalability but also reduces operational overhead, allowing you to focus on improving your search capabilities and user experience.

I encourage you to build upon this foundation, experiment with different embedding models, and explore additional AWS services to further enhance your multimodal search engine. Don’t hesitate to share your experiences or ask questions in the comments below. Happy building!

Thanks,
Eli

Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Continue to Building a Multimodal Search Engine with Amazon Titan Embeddings, Aurora Serveless PostgreSQL and LangChain

From Notebook to Serverless: Creating a Multimodal Search Engine with Amazon Bedrock and PostgreSQL (3 Part Series)

原文链接：Deploying Serverless Embedding App with AWS CDK, Lambda and Amazon Aurora PostgreSQL

展开阅读全文

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END