Mastering Data Flow: Google Sheets to SQLite via Python#

Google Sheets, while being an exceptional tool for the non-tech savvy, isn’t particularly efficient for complex querying. In this guide, I’ll demonstrate how just a few lines of Python can harness your Google Sheet as a valuable input for your SQL database. The beauty of this approach is its adaptability; whether it’s SQLite, PostgreSQL, MongoDB, or any other database you prefer, you can seamlessly transform and integrate your Google Sheet data with ease.

summit

Here, we’ll bridge the worlds of Google Sheets, Pandas, and SQLite to create a seamless data pipeline. Before we dive in, let’s understand the tools in our toolkit and the magic behind our data extraction function:

  1. pandas (imported as pd): Our data manipulation champion! With pd, we can reshape, analyze, and transform our tabular data with ease.

  2. json: It’s how we’ll talk to Google Sheets. Most authentication credentials for Sheets come in JSON format, and this module helps us handle that.

  3. oauth2client & gspread: Our gateways to Google Sheets. The former authenticates us, and the latter lets us read, write, and navigate our sheets like a pro.

  4. sqlite3: Where our processed data finds a home! SQLite offers a lightweight database solution, perfect for storing and querying our data.

  5. datetime & timedelta: Timekeepers of our script. From calculating durations to stamping records, these modules are essential for any time-related operations.

With our tools at the ready, let’s dive into creating a robust data flow, turning Google Sheets insights into actionable SQLite data!

import pandas as pd
import json
from oauth2client.service_account import ServiceAccountCredentials
import gspread
import sqlite3
from datetime import datetime, timedelta

Deep Dive: The Data Extraction Function#

get_user_information(sheet_name): This utility function serves as our bridge to Google Sheets. Before diving into its mechanics, it’s pivotal to have the creds.json file in place. To obtain this file, head to your Google Cloud Console, navigate to the ‘Credentials’ page within the APIs & Services section, and create a service account; once done, you can download its corresponding JSON key. Ensure your Google Sheet is shared with the service account email tied to your creds.json to facilitate seamless authentication and data retrieval. With these preparations, the function selects your desired worksheet, retrieves the data, and elegantly transforms it into a Pandas DataFrame, setting the stage for deeper data operations.

def get_user_information(sheet_name):
  scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']

  creds = ServiceAccountCredentials.from_json_keyfile_name('creds.json',scope)

  client = gspread.authorize(creds)

  industries = client.open("quizs").worksheet(sheet_name)
  users = pd.DataFrame(industries.get_all_values())

  return users
  • creds.json

{   
"type": "service_account",   
"project_id": "YOUR-PROJECT-ID",   
"private_key_id": "YOUR-PRIVATE-KEY-ID",  
"private_key": "YOUR-PRIVATE-KEY",  
"client_email": "YOUR-SERVICE-ACCOUNT@appspot.gserviceaccount.com",   
"client_id": "CLIENT-ID",  
"auth_uri": "https://accounts.google.com/o/oauth2/auth",  
"token_uri": "https://oauth2.googleapis.com/token",   
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",   
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/twittersheet-275317%40appspot.gserviceaccount.com" 
}

Data Transformation Steps:#

  1. Fetching Data: We retrieve data from the worksheet titled “questions” using our function.

  2. Cleaning and Formatting: The DataFrame’s first row becomes our column headers, which we then drop, ensuring a clean data structure.

  3. Timestamping: A new ‘question_created’ column gets added, marking each entry with the current date and time.

  4. Database Initialization: We prepare to inject our data into an SQLite database named ‘quiz.sqlite3’, ensuring our insights find a structured home.

With our toolkit and methods detailed, let’s dive deep, transforming Google Sheets’ insights into structured SQLite datasets. Let the data flow commence!

questions = get_user_information("questions")
questions = questions.rename(columns=questions.iloc[0]).drop(questions.index[0])
questions['question_created'] = datetime.now()
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[3], line 1
----> 1 questions = get_user_information("questions")
      2 questions = questions.rename(columns=questions.iloc[0]).drop(questions.index[0])
      3 questions['question_created'] = datetime.now()

Cell In[2], line 4, in get_user_information(sheet_name)
      1 def get_user_information(sheet_name):
      2   scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
----> 4   creds = ServiceAccountCredentials.from_json_keyfile_name('creds.json',scope)
      6   client = gspread.authorize(creds)
      8   industries = client.open("quizs").worksheet(sheet_name)

File C:\Python311\Lib\site-packages\oauth2client\service_account.py:219, in ServiceAccountCredentials.from_json_keyfile_name(cls, filename, scopes, token_uri, revoke_uri)
    193 @classmethod
    194 def from_json_keyfile_name(cls, filename, scopes='',
    195                            token_uri=None, revoke_uri=None):
    197     """Factory constructor from JSON keyfile by name.
    198 
    199     Args:
   (...)
    217             the keyfile.
    218     """
--> 219     with open(filename, 'r') as file_obj:
    220         client_credentials = json.load(file_obj)
    221     return cls._from_parsed_json_keyfile(client_credentials, scopes,
    222                                          token_uri=token_uri,
    223                                          revoke_uri=revoke_uri)

FileNotFoundError: [Errno 2] No such file or directory: 'creds.json'
questions
id question_text question_type question_created
1 1 Which of the following features distinguishes ... lakehouse 2023-08-26 09:32:30.461600
2 2 Which of the following locations hosts the dri... lakehouse 2023-08-26 09:32:30.461600
3 3 A data architect is designing a data model tha... lakehouse 2023-08-26 09:32:30.461600
4 4 Which of the following describes a scenario in... lakehouse 2023-08-26 09:32:30.461600
5 5 A data engineer has created a Delta table as p... lakehouse 2023-08-26 09:32:30.461600
6 6 Two junior data engineers are authoring separa... lakehouse 2023-08-26 09:32:30.461600
db_path = 'quiz.sqlite3'  
conn = sqlite3.connect(db_path)

db_path = ‘quiz.sqlite3’
conn = sqlite3.connect(db_path)

Conclusion#

The digital landscape offers a myriad of tools to harness and analyze data. In our journey, we’ve explored how the synergy between Google Sheets, Pandas, and SQLite provides a formidable arsenal for data enthusiasts and professionals alike. By crafting a seamless pipeline, we’ve unlocked the potential to transform simple spreadsheet records into structured database entries, ensuring scalability, ease of access, and advanced analytics. Whether you’re just starting out or are a seasoned data veteran, integrating these tools into your workflow can usher in a new era of efficiency and insights. As we always say in the data world, the right tools and the right processes make all the difference. Happy data wrangling!