How to Build a Internal Data App with Google Sheets Backend Using Streamlit

In this article, we’ll walk you through the process of building an internal-use data app using Streamlit with a spreadsheet as the backend. We’ll also explore scenarios where Streamlit is a particularly good fit.

In our previous article, we introduced several ways to build data apps using spreadsheets as the backend.

This time, we’ll dive deeper into one of the code-based frameworks we mentioned — Streamlit — and explain how to build a data app that’s suitable for internal operations.

You can check out the previous article here:https://dev.to/morph-data/how-to-build-an-internal-data-application-using-google-sheets-as-a-data-source-ddo

Step-by-Step Guide

Google Cloud / Google Sheets Configuration

1. Enable the Google Sheets API
Log in to the Google Developers Console, create a new project, and enable the Google Sheets API.

2. Generate a Service Account and Download the JSON Key
Go to APIs & Services > Credentials and create a new service account.Under the Keys tab of the service account, create a new key and download it in JSON format.

3. Grant Access to the Service Account on the Target Spreadsheet
Click the Share button on the Google Spreadsheet and add the service account. This allows the service account to access the spreadsheet data.

Setting Up and Building with Streamlit

In this section, we’ll build a Streamlit app using the directory structure shown below.

project-name/
├── app.py
└── service_key.json
project-name/
│
├── app.py               
└── service_key.json 
project-name/ │ ├── app.py └── service_key.json

Enter fullscreen mode Exit fullscreen mode

1.Move to your project directory and install the Streamlit framework.

<span>cd </span>project-name
pip <span>install </span>streamlit
<span>cd </span>project-name
pip <span>install </span>streamlit
cd project-name pip install streamlit

Enter fullscreen mode Exit fullscreen mode

2.Place the JSON file you obtained in Step-by-Step Guide 2 as service_key.json.

3.Write your code in app.py to build the Streamlit application.

<span>import</span> <span>streamlit</span> <span>as</span> <span>st</span>
<span>from</span> <span>google.oauth2</span> <span>import</span> <span>service_account</span>
<span>import</span> <span>gspread</span>
<span>import</span> <span>pandas</span> <span>as</span> <span>pd</span>
<span>import</span> <span>matplotlib.pyplot</span> <span>as</span> <span>plt</span>
<span>import</span> <span>seaborn</span> <span>as</span> <span>sns</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span>
<span># Add a title to the app </span><span>st</span><span>.</span><span>title</span><span>(</span><span>'</span><span>Dashboard</span><span>'</span><span>)</span>
<span># Google Sheets Authentication </span><span>scopes</span> <span>=</span> <span>[</span> <span>'</span><span>https://www.googleapis.com/auth/spreadsheets</span><span>'</span><span>,</span> <span>'</span><span>https://www.googleapis.com/auth/drive</span><span>'</span>
<span>]</span>
<span>credentials</span> <span>=</span> <span>service_account</span><span>.</span><span>Credentials</span><span>.</span><span>from_service_account_file</span><span>(</span> <span>"</span><span>./service_key.json</span><span>"</span><span>,</span> <span>scopes</span><span>=</span><span>scopes</span>
<span>)</span>
<span>gc</span> <span>=</span> <span>gspread</span><span>.</span><span>authorize</span><span>(</span><span>credentials</span><span>)</span>
<span># Get data from Google Sheets </span><span>SP_SHEET_KEY</span> <span>=</span> <span>'</span><span>{Sheet ID}</span><span>'</span> <span># sheet id (which can be found in the URL of the Google Sheet, after https://docs.google.com/spreadsheets/d/) </span><span>sh</span> <span>=</span> <span>gc</span><span>.</span><span>open_by_key</span><span>(</span><span>SP_SHEET_KEY</span><span>)</span>
<span>SP_SHEET</span> <span>=</span> <span>'</span><span>test</span><span>'</span> <span># Sheet name </span><span>worksheet</span> <span>=</span> <span>sh</span><span>.</span><span>worksheet</span><span>(</span><span>SP_SHEET</span><span>)</span>
<span>data</span> <span>=</span> <span>worksheet</span><span>.</span><span>get_all_values</span><span>()</span> <span># Get all data from the sheet </span><span>df</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>(</span><span>data</span><span>[</span><span>1</span><span>:],</span> <span>columns</span><span>=</span><span>data</span><span>[</span><span>0</span><span>])</span> <span># Convert data to DataFrame </span>
<span># Display original DataFrame (limited to 100 rows with scrolling) </span><span>st</span><span>.</span><span>subheader</span><span>(</span><span>'</span><span>Original Data</span><span>'</span><span>)</span>
<span>st</span><span>.</span><span>dataframe</span><span>(</span><span>df</span><span>.</span><span>head</span><span>(</span><span>100</span><span>),</span> <span>height</span><span>=</span><span>400</span><span>)</span> <span># height parameter adds scrolling capability </span>
<span># Check if 'Created Date' column exists </span><span>if</span> <span>'</span><span>Created Date</span><span>'</span> <span>in</span> <span>df</span><span>.</span><span>columns</span><span>:</span>
<span># Convert to datetime format (adjust format as needed) </span> <span>df</span><span>[</span><span>'</span><span>Created Date</span><span>'</span><span>]</span> <span>=</span> <span>pd</span><span>.</span><span>to_datetime</span><span>(</span><span>df</span><span>[</span><span>'</span><span>Created Date</span><span>'</span><span>])</span>
<span># Group by date and count records </span> <span>count_by_date</span> <span>=</span> <span>df</span><span>.</span><span>groupby</span><span>(</span><span>df</span><span>[</span><span>'</span><span>Created Date</span><span>'</span><span>].</span><span>dt</span><span>.</span><span>date</span><span>).</span><span>size</span><span>().</span><span>reset_index</span><span>()</span>
<span>count_by_date</span><span>.</span><span>columns</span> <span>=</span> <span>[</span><span>'</span><span>Created Date</span><span>'</span><span>,</span> <span>'</span><span>Count</span><span>'</span><span>]</span>
<span># Sort by Created Date in descending order </span> <span>count_by_date</span> <span>=</span> <span>count_by_date</span><span>.</span><span>sort_values</span><span>(</span><span>'</span><span>Created Date</span><span>'</span><span>,</span> <span>ascending</span><span>=</span><span>False</span><span>)</span>
<span># Create two columns for side-by-side display </span> <span>st</span><span>.</span><span>subheader</span><span>(</span><span>'</span><span>Record Count by Date</span><span>'</span><span>)</span>
<span>col1</span><span>,</span> <span>col2</span> <span>=</span> <span>st</span><span>.</span><span>columns</span><span>([</span><span>3</span><span>,</span> <span>2</span><span>])</span> <span># Adjust the ratio to make the first column wider </span>
<span># Display grouped DataFrame in the first column </span> <span>with</span> <span>col1</span><span>:</span>
<span>st</span><span>.</span><span>dataframe</span><span>(</span><span>count_by_date</span><span>.</span><span>head</span><span>(</span><span>100</span><span>),</span> <span>height</span><span>=</span><span>400</span><span>)</span>
<span># Create line chart in the second column </span> <span>with</span> <span>col2</span><span>:</span>
<span># For the chart, we need to sort by date in ascending order </span> <span>chart_data</span> <span>=</span> <span>count_by_date</span><span>.</span><span>sort_values</span><span>(</span><span>'</span><span>Created Date</span><span>'</span><span>,</span> <span>ascending</span><span>=</span><span>True</span><span>)</span>
<span>st</span><span>.</span><span>line_chart</span><span>(</span><span>chart_data</span><span>.</span><span>set_index</span><span>(</span><span>'</span><span>Created Date</span><span>'</span><span>))</span>
<span>else</span><span>:</span>
<span>st</span><span>.</span><span>error</span><span>(</span><span>"'</span><span>Created Date</span><span>'</span><span> column not found in the DataFrame. Please check the column name.</span><span>"</span><span>)</span>
<span>st</span><span>.</span><span>write</span><span>(</span><span>"</span><span>Available columns:</span><span>"</span><span>,</span> <span>df</span><span>.</span><span>columns</span><span>.</span><span>tolist</span><span>())</span>
<span>import</span> <span>streamlit</span> <span>as</span> <span>st</span>
<span>from</span> <span>google.oauth2</span> <span>import</span> <span>service_account</span>
<span>import</span> <span>gspread</span>
<span>import</span> <span>pandas</span> <span>as</span> <span>pd</span>
<span>import</span> <span>matplotlib.pyplot</span> <span>as</span> <span>plt</span>
<span>import</span> <span>seaborn</span> <span>as</span> <span>sns</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span>

<span># Add a title to the app </span><span>st</span><span>.</span><span>title</span><span>(</span><span>'</span><span>Dashboard</span><span>'</span><span>)</span>

<span># Google Sheets Authentication </span><span>scopes</span> <span>=</span> <span>[</span> <span>'</span><span>https://www.googleapis.com/auth/spreadsheets</span><span>'</span><span>,</span> <span>'</span><span>https://www.googleapis.com/auth/drive</span><span>'</span>
<span>]</span>
<span>credentials</span> <span>=</span> <span>service_account</span><span>.</span><span>Credentials</span><span>.</span><span>from_service_account_file</span><span>(</span> <span>"</span><span>./service_key.json</span><span>"</span><span>,</span> <span>scopes</span><span>=</span><span>scopes</span>
<span>)</span>
<span>gc</span> <span>=</span> <span>gspread</span><span>.</span><span>authorize</span><span>(</span><span>credentials</span><span>)</span>

<span># Get data from Google Sheets </span><span>SP_SHEET_KEY</span> <span>=</span> <span>'</span><span>{Sheet ID}</span><span>'</span> <span># sheet id (which can be found in the URL of the Google Sheet, after https://docs.google.com/spreadsheets/d/) </span><span>sh</span> <span>=</span> <span>gc</span><span>.</span><span>open_by_key</span><span>(</span><span>SP_SHEET_KEY</span><span>)</span>
<span>SP_SHEET</span> <span>=</span> <span>'</span><span>test</span><span>'</span> <span># Sheet name </span><span>worksheet</span> <span>=</span> <span>sh</span><span>.</span><span>worksheet</span><span>(</span><span>SP_SHEET</span><span>)</span>
<span>data</span> <span>=</span> <span>worksheet</span><span>.</span><span>get_all_values</span><span>()</span> <span># Get all data from the sheet </span><span>df</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>(</span><span>data</span><span>[</span><span>1</span><span>:],</span> <span>columns</span><span>=</span><span>data</span><span>[</span><span>0</span><span>])</span> <span># Convert data to DataFrame </span>
<span># Display original DataFrame (limited to 100 rows with scrolling) </span><span>st</span><span>.</span><span>subheader</span><span>(</span><span>'</span><span>Original Data</span><span>'</span><span>)</span>
<span>st</span><span>.</span><span>dataframe</span><span>(</span><span>df</span><span>.</span><span>head</span><span>(</span><span>100</span><span>),</span> <span>height</span><span>=</span><span>400</span><span>)</span>  <span># height parameter adds scrolling capability </span>
<span># Check if 'Created Date' column exists </span><span>if</span> <span>'</span><span>Created Date</span><span>'</span> <span>in</span> <span>df</span><span>.</span><span>columns</span><span>:</span>
    <span># Convert to datetime format (adjust format as needed) </span>    <span>df</span><span>[</span><span>'</span><span>Created Date</span><span>'</span><span>]</span> <span>=</span> <span>pd</span><span>.</span><span>to_datetime</span><span>(</span><span>df</span><span>[</span><span>'</span><span>Created Date</span><span>'</span><span>])</span>

    <span># Group by date and count records </span>    <span>count_by_date</span> <span>=</span> <span>df</span><span>.</span><span>groupby</span><span>(</span><span>df</span><span>[</span><span>'</span><span>Created Date</span><span>'</span><span>].</span><span>dt</span><span>.</span><span>date</span><span>).</span><span>size</span><span>().</span><span>reset_index</span><span>()</span>
    <span>count_by_date</span><span>.</span><span>columns</span> <span>=</span> <span>[</span><span>'</span><span>Created Date</span><span>'</span><span>,</span> <span>'</span><span>Count</span><span>'</span><span>]</span>

    <span># Sort by Created Date in descending order </span>    <span>count_by_date</span> <span>=</span> <span>count_by_date</span><span>.</span><span>sort_values</span><span>(</span><span>'</span><span>Created Date</span><span>'</span><span>,</span> <span>ascending</span><span>=</span><span>False</span><span>)</span>

    <span># Create two columns for side-by-side display </span>    <span>st</span><span>.</span><span>subheader</span><span>(</span><span>'</span><span>Record Count by Date</span><span>'</span><span>)</span>
    <span>col1</span><span>,</span> <span>col2</span> <span>=</span> <span>st</span><span>.</span><span>columns</span><span>([</span><span>3</span><span>,</span> <span>2</span><span>])</span>  <span># Adjust the ratio to make the first column wider </span>
    <span># Display grouped DataFrame in the first column </span>    <span>with</span> <span>col1</span><span>:</span>
        <span>st</span><span>.</span><span>dataframe</span><span>(</span><span>count_by_date</span><span>.</span><span>head</span><span>(</span><span>100</span><span>),</span> <span>height</span><span>=</span><span>400</span><span>)</span>

    <span># Create line chart in the second column </span>    <span>with</span> <span>col2</span><span>:</span>
        <span># For the chart, we need to sort by date in ascending order </span>        <span>chart_data</span> <span>=</span> <span>count_by_date</span><span>.</span><span>sort_values</span><span>(</span><span>'</span><span>Created Date</span><span>'</span><span>,</span> <span>ascending</span><span>=</span><span>True</span><span>)</span>
        <span>st</span><span>.</span><span>line_chart</span><span>(</span><span>chart_data</span><span>.</span><span>set_index</span><span>(</span><span>'</span><span>Created Date</span><span>'</span><span>))</span>
<span>else</span><span>:</span>
    <span>st</span><span>.</span><span>error</span><span>(</span><span>"'</span><span>Created Date</span><span>'</span><span> column not found in the DataFrame. Please check the column name.</span><span>"</span><span>)</span>
    <span>st</span><span>.</span><span>write</span><span>(</span><span>"</span><span>Available columns:</span><span>"</span><span>,</span> <span>df</span><span>.</span><span>columns</span><span>.</span><span>tolist</span><span>())</span>
import streamlit as st from google.oauth2 import service_account import gspread import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from datetime import datetime # Add a title to the app st.title('Dashboard') # Google Sheets Authentication scopes = [ 'https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive' ] credentials = service_account.Credentials.from_service_account_file( "./service_key.json", scopes=scopes ) gc = gspread.authorize(credentials) # Get data from Google Sheets SP_SHEET_KEY = '{Sheet ID}' # sheet id (which can be found in the URL of the Google Sheet, after https://docs.google.com/spreadsheets/d/) sh = gc.open_by_key(SP_SHEET_KEY) SP_SHEET = 'test' # Sheet name worksheet = sh.worksheet(SP_SHEET) data = worksheet.get_all_values() # Get all data from the sheet df = pd.DataFrame(data[1:], columns=data[0]) # Convert data to DataFrame # Display original DataFrame (limited to 100 rows with scrolling) st.subheader('Original Data') st.dataframe(df.head(100), height=400) # height parameter adds scrolling capability # Check if 'Created Date' column exists if 'Created Date' in df.columns: # Convert to datetime format (adjust format as needed) df['Created Date'] = pd.to_datetime(df['Created Date']) # Group by date and count records count_by_date = df.groupby(df['Created Date'].dt.date).size().reset_index() count_by_date.columns = ['Created Date', 'Count'] # Sort by Created Date in descending order count_by_date = count_by_date.sort_values('Created Date', ascending=False) # Create two columns for side-by-side display st.subheader('Record Count by Date') col1, col2 = st.columns([3, 2]) # Adjust the ratio to make the first column wider # Display grouped DataFrame in the first column with col1: st.dataframe(count_by_date.head(100), height=400) # Create line chart in the second column with col2: # For the chart, we need to sort by date in ascending order chart_data = count_by_date.sort_values('Created Date', ascending=True) st.line_chart(chart_data.set_index('Created Date')) else: st.error("'Created Date' column not found in the DataFrame. Please check the column name.") st.write("Available columns:", df.columns.tolist())

Enter fullscreen mode Exit fullscreen mode

4.Open the app in your browser to verify that it works.

streamlit run app.py
streamlit run app.py
streamlit run app.py

Enter fullscreen mode Exit fullscreen mode

Adding Authentication for Internal Use

When running Streamlit apps internally, authentication becomes essential.Here, we’ll cover three methods for implementing authentication.
If you’re only deploying a single app, the first method might be sufficient.However, if you have multiple apps or plan to scale in the future, we recommend using either the second or third approach.

1.Add Authentication to Each App Individually

As of version 1.42.0, Streamlit supports OpenID Connect (OIDC) for user authentication.You can use this built-in feature to add flexible authentication to your app.

  • Pros: Highly flexible authentication
  • Cons: Requires implementation for each app, which increases development effort

Official documentation: https://docs.streamlit.io/develop/concepts/connections/authentication

2.Use Snowflake in Streamlit

Snowflake in Streamlit allows you to build and host Streamlit apps directly on the Snowflake platform.Since authentication to Snowflake also applies to your Streamlit app, no additional authentication implementation is required.

  • Pros: No need to implement authentication individually for each app
  • Cons: Requires a Snowflake account / Limited to fewer components compared to open-source Streamlit

Official documentation: https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit

3.Use Morph

Morph is a cloud platform that supports Streamlit deployment.By deploying your Streamlit app to Morph, you can leverage Morph’s built-in authentication, eliminating the need to implement it manually.

  • Pros: No need for individual authentication implementation / Centralized app management
  • Cons: Involves using an external service (Morph)

Official documentation: https://www.morph-data.io/

Conclusion

Streamlit is a great tool for data scientists and analysts who are comfortable with Python and want to build data applications powered by machine learning or AI.

It’s especially well-suited for internal tools like:

  • Workflow automation apps using LLMs
  • Demand forecasting tools
  • Internal chatbots that reference company documents

There are endless ways to apply Streamlit — so we encourage you to give it a try!

原文链接:How to Build a Internal Data App with Google Sheets Backend Using Streamlit

© 版权声明
THE END
喜欢就支持一下吧
点赞15 分享
Have faith in your dreams and someday your rainbow will come smiling through.
请对梦想充满信心,总有一天属于你的彩虹会在天空微笑
评论 抢沙发

请登录后发表评论

    暂无评论内容