Understanding File and Directory Paths in Python#

In my journey of developing a Flask app, I had a script named load_sheet_db.py that was responsible for loading data from Google Sheets. As I dove deeper into the code, I encountered the file variable. In Python scripts, __file__ is a magical built-in global variable. It holds the path of the script that’s currently being executed, which, in my case, was load_sheet_db.py. This path can either be absolute or relative, and it hinges on the way the script was invoked.

The Power of __file__#

At the heart of this process is the built-in global variable __file__. In Python scripts, `file`` represents the path of the script currently being executed. Depending on how you invoked the script, this could be an absolute path or a relative one.

Example: If you’ve ever executed a script with:#

python some_folder/my_script.py

__file__ would contain 'some_folder/my_script.py'.

user
│
├── main_directory
│   ├── flaskr
│   │   └── load_sheet_db.py
│   │
│   └── instance
│       └── flaskr.sqlite

summit

Resolving the Real Path: os.path.realpath()#

Symbolic links can sometimes mask the true location of your script. To avoid any ambiguities and get the actual path to your script, we use:

os.path.realpath(__file__)

This function returns the canonical path of the specified filename, eliminating any symbolic links encountered.

Finding the Parent Directory: os.path.dirname()#

Once you know the full path to your script, the next step is often to determine its parent directory. The function for this is:

os.path.dirname(os.path.realpath(__file__))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 os.path.dirname(os.path.realpath(__file__))

NameError: name 'os' is not defined

This returns the directory name of the provided path, allowing you to work relative to the script’s location.

Building Paths Intelligently: os.path.join()#

String concatenation can be tricky and error-prone when building paths, especially if you aim for cross-platform compatibility. Python offers:

os.path.join()
db_path = 'quiz.sqlite3'  
conn = sqlite3.connect(db_path)

This function constructs paths by merging multiple components, ensuring they fit the OS’s path structure. Particularly useful is the ‘..’ component, which indicates moving one directory up.

dir_path = os.path.dirname(os.path.realpath(__file__))
db_path = os.path.join(dir_path, '..', 'instance', 'flaskr.sqlite')
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
import gspread
import sqlite3
import sqlite3
import os


def get_user_information(sheet_name):

  scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
  dir_path = os.path.dirname(os.path.realpath(__file__))
  creds_path = os.path.join(dir_path, 'creds.json')
  creds = ServiceAccountCredentials.from_json_keyfile_name(creds_path, scope)
  client = gspread.authorize(creds)
  industries = client.open("quizs").worksheet(sheet_name)
  users = pd.DataFrame(industries.get_all_values())
  return users

def make_float(x):

    if x is None or x == "":
        return 0.0
    elif type(x) == float:
      return x
    elif type(x) == int:
      return float(x)
    else:
      x = x.replace(",", "")
      return float(x)


def clean_data(df, exclude_col):
    cols = df.columns
    for col in cols:
        if col != exclude_col:
            df[col] = df[col].map(lambda x: make_float(x))
    return df

def get_data_from_sheets(name_of_sheet):
    df = get_user_information(name_of_sheet)  # assuming get_user_information is defined somewhere else
    df = df.rename(columns=df.iloc[0]).drop(df.index[0])
    return df

def turn_data_sqlite(sheet_name):
    #  ---- Access Google Sheets data and turn into pandas Dataframe ----
    print("Function called")  # Check if the function is entered
    data = get_data_from_sheets(sheet_name)
    cleaned_df = data
    cleaned_df['created'] = datetime.now()
    dir_path = os.path.dirname(os.path.realpath(__file__))
    db_path = os.path.join(dir_path, '..', 'instance', 'flaskr.sqlite')
    print(f"Database path: {db_path}")  # Add this line
    conn = sqlite3.connect(db_path)
    print(F"{cleaned_df.shape[0]} rows {cleaned_df.shape[1]} columns are written to database")
    cleaned_df.to_sql(sheet_name, conn, if_exists='append', index=False)
    conn.close()
    return "It is done!"

In the context of our tutorial, this effectively moves up from the script’s directory and points to a sibling directory named ‘instance’.

Wrapping Up#

Navigating file and directory paths in Python might seem daunting at first, but with the right tools, it becomes a breeze. By understanding and combining file, os.path.realpath(), os.path.dirname(), and os.path.join(), you can ensure that your scripts remain robust, no matter where they’re run from or on which system.