Understanding File and Directory Paths in Python#
In my journey of developing a Flask app, I had a script named load_sheet_db.py that was responsible for loading data from Google Sheets. As I dove deeper into the code, I encountered the file variable. In Python scripts, __file__
is a magical built-in global variable. It holds the path of the script that’s currently being executed, which, in my case, was load_sheet_db.py. This path can either be absolute or relative, and it hinges on the way the script was invoked.
The Power of __file__
#
At the heart of this process is the built-in global variable __file__
. In Python scripts, `file`` represents the path of the script currently being executed. Depending on how you invoked the script, this could be an absolute path or a relative one.
Example: If you’ve ever executed a script with:#
python some_folder/my_script.py
__file__
would contain 'some_folder/my_script.py'
.
user
│
├── main_directory
│ ├── flaskr
│ │ └── load_sheet_db.py
│ │
│ └── instance
│ └── flaskr.sqlite
Resolving the Real Path: os.path.realpath()
#
Symbolic links can sometimes mask the true location of your script. To avoid any ambiguities and get the actual path to your script, we use:
os.path.realpath(__file__)
This function returns the canonical path of the specified filename, eliminating any symbolic links encountered.
Finding the Parent Directory: os.path.dirname()
#
Once you know the full path to your script, the next step is often to determine its parent directory. The function for this is:
os.path.dirname(os.path.realpath(__file__))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 1
----> 1 os.path.dirname(os.path.realpath(__file__))
NameError: name 'os' is not defined
This returns the directory name of the provided path, allowing you to work relative to the script’s location.
Building Paths Intelligently: os.path.join()#
String concatenation can be tricky and error-prone when building paths, especially if you aim for cross-platform compatibility. Python offers:
os.path.join()
db_path = 'quiz.sqlite3'
conn = sqlite3.connect(db_path)
This function constructs paths by merging multiple components, ensuring they fit the OS’s path structure. Particularly useful is the ‘..’ component, which indicates moving one directory up.
dir_path = os.path.dirname(os.path.realpath(__file__))
db_path = os.path.join(dir_path, '..', 'instance', 'flaskr.sqlite')
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
import gspread
import sqlite3
import sqlite3
import os
def get_user_information(sheet_name):
scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
dir_path = os.path.dirname(os.path.realpath(__file__))
creds_path = os.path.join(dir_path, 'creds.json')
creds = ServiceAccountCredentials.from_json_keyfile_name(creds_path, scope)
client = gspread.authorize(creds)
industries = client.open("quizs").worksheet(sheet_name)
users = pd.DataFrame(industries.get_all_values())
return users
def make_float(x):
if x is None or x == "":
return 0.0
elif type(x) == float:
return x
elif type(x) == int:
return float(x)
else:
x = x.replace(",", "")
return float(x)
def clean_data(df, exclude_col):
cols = df.columns
for col in cols:
if col != exclude_col:
df[col] = df[col].map(lambda x: make_float(x))
return df
def get_data_from_sheets(name_of_sheet):
df = get_user_information(name_of_sheet) # assuming get_user_information is defined somewhere else
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
return df
def turn_data_sqlite(sheet_name):
# ---- Access Google Sheets data and turn into pandas Dataframe ----
print("Function called") # Check if the function is entered
data = get_data_from_sheets(sheet_name)
cleaned_df = data
cleaned_df['created'] = datetime.now()
dir_path = os.path.dirname(os.path.realpath(__file__))
db_path = os.path.join(dir_path, '..', 'instance', 'flaskr.sqlite')
print(f"Database path: {db_path}") # Add this line
conn = sqlite3.connect(db_path)
print(F"{cleaned_df.shape[0]} rows {cleaned_df.shape[1]} columns are written to database")
cleaned_df.to_sql(sheet_name, conn, if_exists='append', index=False)
conn.close()
return "It is done!"
In the context of our tutorial, this effectively moves up from the script’s directory and points to a sibling directory named ‘instance’.
Wrapping Up#
Navigating file and directory paths in Python might seem daunting at first, but with the right tools, it becomes a breeze. By understanding and combining file, os.path.realpath(), os.path.dirname(), and os.path.join(), you can ensure that your scripts remain robust, no matter where they’re run from or on which system.