Understanding File and Directory Paths in Python

Understanding File and Directory Paths in Python#

In my journey of developing a Flask app, I had a script named load_sheet_db.py that was responsible for loading data from Google Sheets. As I dove deeper into the code, I encountered the file variable. In Python scripts, __file__ is a magical built-in global variable. It holds the path of the script that’s currently being executed, which, in my case, was load_sheet_db.py. This path can either be absolute or relative, and it hinges on the way the script was invoked.

The Power of `file`#

At the heart of this process is the built-in global variable __file__. In Python scripts, `file`` represents the path of the script currently being executed. Depending on how you invoked the script, this could be an absolute path or a relative one.

Example: If you’ve ever executed a script with:#

python some_folder/my_script.py

__file__ would contain 'some_folder/my_script.py'.

user
│
├── main_directory
│   ├── flaskr
│   │   └── load_sheet_db.py
│   │
│   └── instance
│       └── flaskr.sqlite

summit

Resolving the Real Path: `os.path.realpath()`#

Symbolic links can sometimes mask the true location of your script. To avoid any ambiguities and get the actual path to your script, we use:

os.path.realpath(__file__)

This function returns the canonical path of the specified filename, eliminating any symbolic links encountered.

Finding the Parent Directory: `os.path.dirname()`#

Once you know the full path to your script, the next step is often to determine its parent directory. The function for this is:

os.path.dirname(os.path.realpath(__file__))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 os.path.dirname(os.path.realpath(__file__))

NameError: name 'os' is not defined

This returns the directory name of the provided path, allowing you to work relative to the script’s location.

Building Paths Intelligently: os.path.join()#

String concatenation can be tricky and error-prone when building paths, especially if you aim for cross-platform compatibility. Python offers:

os.path.join()

db_path = 'quiz.sqlite3'  
conn = sqlite3.connect(db_path)

This function constructs paths by merging multiple components, ensuring they fit the OS’s path structure. Particularly useful is the ‘..’ component, which indicates moving one directory up.

dir_path = os.path.dirname(os.path.realpath(__file__))
db_path = os.path.join(dir_path, '..', 'instance', 'flaskr.sqlite')

import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
import gspread
import sqlite3
import sqlite3
import os


def get_user_information(sheet_name):

  scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
  dir_path = os.path.dirname(os.path.realpath(__file__))
  creds_path = os.path.join(dir_path, 'creds.json')
  creds = ServiceAccountCredentials.from_json_keyfile_name(creds_path, scope)
  client = gspread.authorize(creds)
  industries = client.open("quizs").worksheet(sheet_name)
  users = pd.DataFrame(industries.get_all_values())
  return users

def make_float(x):

    if x is None or x == "":
        return 0.0
    elif type(x) == float:
      return x
    elif type(x) == int:
      return float(x)
    else:
      x = x.replace(",", "")
      return float(x)


def clean_data(df, exclude_col):
    cols = df.columns
    for col in cols:
        if col != exclude_col:
            df[col] = df[col].map(lambda x: make_float(x))
    return df

def get_data_from_sheets(name_of_sheet):
    df = get_user_information(name_of_sheet)  # assuming get_user_information is defined somewhere else
    df = df.rename(columns=df.iloc[0]).drop(df.index[0])
    return df

def turn_data_sqlite(sheet_name):
    #  ---- Access Google Sheets data and turn into pandas Dataframe ----
    print("Function called")  # Check if the function is entered
    data = get_data_from_sheets(sheet_name)
    cleaned_df = data
    cleaned_df['created'] = datetime.now()
    dir_path = os.path.dirname(os.path.realpath(__file__))
    db_path = os.path.join(dir_path, '..', 'instance', 'flaskr.sqlite')
    print(f"Database path: {db_path}")  # Add this line
    conn = sqlite3.connect(db_path)
    print(F"{cleaned_df.shape[0]} rows {cleaned_df.shape[1]} columns are written to database")
    cleaned_df.to_sql(sheet_name, conn, if_exists='append', index=False)
    conn.close()
    return "It is done!"

In the context of our tutorial, this effectively moves up from the script’s directory and points to a sibling directory named ‘instance’.

Wrapping Up#

Navigating file and directory paths in Python might seem daunting at first, but with the right tools, it becomes a breeze. By understanding and combining file, os.path.realpath(), os.path.dirname(), and os.path.join(), you can ensure that your scripts remain robust, no matter where they’re run from or on which system.

Understanding File and Directory Paths in Python

Contents