SQLite is a lightweight, serverless, self-contained SQL database engine. It is widely used in applications ranging from small mobile apps to large-scale web applications due to its simplicity and ease of integration.
While SQLite can be used directly through SQL commands, we will use Python’s built-in sqlite3 module to interact with SQLite databases. This allows us to leverage the power of SQL while working within a Python environment.
▶️ Import pandas, numpy, and sqlite3.
import pandas as pd
import numpy as np
import sqlite3💻 Connecting to a Database¶
Because SQLite is self-contained, connecting to a database is as simple as specifying the database file. If the file does not exist, SQLite will create it.
▶️ Run the code below to create a new SQLite database file.
# Connect to my_database.db database file
# If the file does not exist, create a new file
conn = sqlite3.connect("my_database.db")▶️ Run the code below to create a new employees table.
# Create a cursor
# A cursor enables users of a DBMS to traverse through the result set
# without having to retrieve all results at once
c = conn.cursor()
# Drop (delete) employees table if it already exists
c.execute("DROP TABLE IF EXISTS employees")
conn.commit()
# Define a query
# Triple quotes (""") denote multiline strings
create_table_query = """
CREATE TABLE IF NOT EXISTS employees (
emp_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department TEXT,
salary REAL
);
"""
c.execute(create_table_query)
conn.commit()We can use the conn object to execute SQL commands and manage the database. The cursor object allows us to execute SQL queries and fetch results.
▶️ Run the code below to insert sample data into the employees table.
c.executescript(
"""
INSERT INTO employees (name, department, salary) VALUES ('Mia', 'HR', 140000.0);
INSERT INTO employees (name, department, salary) VALUES ('Michael', 'Marketing', 200000.0);
INSERT INTO employees (name, department, salary) VALUES ('Sean', 'HR', 250000.0);
INSERT INTO employees (name, department, salary) VALUES ('Dylan', 'Marketing', 205000.0);
"""
)
conn.commit()| Line | Function | Explanation |
|---|---|---|
executescript() | Run multiple SQL commands | Inserts multiple employee records |
commit() | Save changes | Writes all changes permanently |
🔍 Retrieving Data¶
We can use pandas to run SQL queries and load the results into a DataFrame for easier analysis. This is easier than using the cursor object directly for data retrieval.
▶️ Query all records from the employees table and load them into a DataFrame named df.
Note that each row (record) corresponds to an employee, and each column corresponds to an attribute of the employee (e.g., name, department, salary).
df = pd.read_sql_query("SELECT * FROM employees;", con=conn)
display(df)▶️ Query the employees table for Marketing department employees sorted by salary in descending order.
df = pd.read_sql_query(
"""
SELECT emp_id, name, salary
FROM employees
WHERE department = "Marketing"
ORDER BY salary DESC;
""",
con=conn,
)
display(df)📚 Delete and Update Records¶
SQLite supports the standard CRUD (Create, Read, Update, Delete) operations. We have already seen how to create tables and read data. Let’s look at how to update and delete records.
➖ Delete Records¶
The DELETE statement is used to remove records from a table. You can specify which records to delete using a WHERE clause.
▶️ Delete employees from the "HR" department.
c.execute("DELETE FROM employees WHERE department = 'HR';")
conn.commit()You can verify the deletion by querying the employees table again. You should see that the employees from the HR department have been removed.
▶️ Print all remaining employees to verify the deletion.
df = pd.read_sql_query("SELECT * FROM employees;", con=conn)
display(df)➕ Update Records¶
The UPDATE statement is used to modify existing records in a table. You can specify which records to update using a WHERE clause.
Assume Michael has moved to the “Purchasing” department, and we need to update his record accordingly.
▶️ Update Michael’s department to "Purchasing".
c.execute("UPDATE employees SET department = 'Purchasing' WHERE name = 'Michael';")
conn.commit()▶️ Print all employees to verify the update.
df = pd.read_sql_query("SELECT * FROM employees;", con=conn)
display(df)🆚 Comparing Pandas and SQL using Airbnb Listings¶
In this section, we will compare how to perform similar tasks using both pandas and SQL. We will use a dataset of Airbnb listings for this comparison.
▶️ Create a new SQLite database file named airbnb.db and create a table named listings.
# Connect to airbnb.db database file
# If the file does not exist, create a new file
conn = sqlite3.connect("airbnb.db")
# Create a cursor
# A cursor enables users of a DBMS to traverse through the result set
# without having to retrieve all results at once
c = conn.cursor()
# Drop (delete) listings table if it already exists
c.execute("DROP TABLE IF EXISTS listings")
conn.commit()
# Define a query
# Triple quotes (''') denote multiline strings
create_table_query = """
CREATE TABLE IF NOT EXISTS listings (
neighbourhood TEXT,
room_type TEXT,
bathrooms REAL,
bedrooms REAL,
price REAL,
number_of_reviews INTEGER,
review_score REAL,
is_superhost INT
)
"""
c.execute(create_table_query)
conn.commit()▶️ Read a CSV file containing Airbnb listings into a pandas DataFrame.
df_listings = pd.read_csv(
"https://github.com/bdi475/datasets/raw/main/chicago-airbnb-listings-small.csv"
)
display(df_listings.head(5))▶️ Populate the listings table with data from the CSV file.
tables = list(
pd.read_sql_query("SELECT * FROM sqlite_master WHERE type='table';", con=conn)[
"tbl_name"
]
)
if "listings" in tables:
c.execute(f"DELETE FROM listings")
conn.commit()
df_listings.to_sql(name="listings", index=False, con=conn, if_exists="append")826📌 Example 1: Select all rows and columns¶
▶️ Use pandas.
display(df_listings)▶️ Use SQLite.
query_select_all = """
SELECT *
FROM listings
"""
df_result = pd.read_sql_query(query_select_all, con=conn)
display(df_result)📌 Example 2: Select only three columns¶
Select only the "bathrooms", "bedrooms", and "price" columns.
▶️ Use pandas.
df_listings[["bathrooms", "bedrooms", "price"]]▶️ Use SQLite.
query_select_three_columns = """
SELECT bathrooms, bedrooms, price
FROM listings
"""
df_result = pd.read_sql_query(query_select_three_columns, con=conn)
display(df_result)📌 Example 3: Select all columns of expensive listings¶
Filter listings where the price is greater than $400.
▶️ Use pandas.
df_expensive = df_listings[df_listings["price"] > 400]
display(df_expensive)▶️ Use SQLite.
query_expensive_listings = """
SELECT *
FROM listings
WHERE price > 400
"""
df_result = pd.read_sql_query(query_expensive_listings, con=conn)
display(df_result)📌 Example 4: Select all columns of West Town listings¶
Filter listings located in the "West Town" neighborhood.
▶️ Use pandas.
df_west_town = df_listings[df_listings["neighbourhood"] == "West Town"]
display(df_west_town)▶️ Use SQLite.
query_west_town_listings = """
SELECT *
FROM listings
WHERE neighbourhood = 'West Town'
"""
df_result = pd.read_sql_query(query_west_town_listings, con=conn)
display(df_result)📌 Example 5: Sort all listings by price¶
Sort all listings by "price" in ascending order. Both pandas and SQL sort in ascending order by default.
▶️ Use pandas.
df_sorted_by_price = df_listings.sort_values(by="price")
display(df_sorted_by_price)▶️ Use SQLite.
query_sort_by_price = """
SELECT *
FROM listings
ORDER BY price
"""
df_result = pd.read_sql_query(query_sort_by_price, con=conn)
display(df_result)📌 Example 6: Select only the price column and sort in descending order¶
Select only the "price" column and sort the results in descending order.
▶️ Use pandas.
df_price = df_listings[["price"]].sort_values(by="price", ascending=False)
display(df_price)▶️ Use SQLite.
query_sort_by_price_desc = """
SELECT price
FROM listings
ORDER BY price DESC
"""
df_result = pd.read_sql_query(query_sort_by_price_desc, con=conn)
display(df_result)📌 Example 7: A complicated query¶
Find all listings with a price less than $100 and more than 500 reviews, sorted by number of reviews in descending order. Additionally, only select the "room_type", "bathrooms", "bedrooms", "price", "number_of_reviews" columns.
▶️ Use pandas.
df_complicated = df_listings[
(df_listings["price"] < 100) & (df_listings["number_of_reviews"] > 500)
].sort_values(by="number_of_reviews", ascending=False)
df_complicated = df_complicated[
["room_type", "bathrooms", "bedrooms", "price", "number_of_reviews"]
]
display(df_complicated)▶️ Use SQLite.
query_final = """
SELECT room_type, bathrooms, bedrooms, price, number_of_reviews
FROM listings
WHERE (price < 100) AND (number_of_reviews > 500)
ORDER BY number_of_reviews DESC
"""
df_result = pd.read_sql_query(query_final, con=conn)
display(df_result)