Skip to article frontmatterSkip to article content
  • 🏆 20 points available

▶️ First, run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

# DO NOT MODIFY THE CODE IN THIS CELL
import unittest
import base64

tc = unittest.TestCase()

🎯 Import packages

👇 Tasks

  • ✔️ Import the following Python packages.

    1. pandas: Use alias pd.

    2. numpy: Use alias np.

    3. sqlite3: No alias

# YOUR CODE BEGINS

# YOUR CODE ENDS

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

# DO NOT CHANGE THE CODE IN THIS CELL
tc.assertTrue(
    "pd" in globals(), "Check whether you have correctly imported Pandas with an alias."
)
tc.assertTrue(
    "np" in globals(), "Check whether you have correctly imported NumPy with an alias."
)
tc.assertTrue(
    "sqlite3" in globals(),
    "Check whether you have correctly imported the sqlite3 package.",
)

📌 Populate a database table from a CSV file

▶️ Run the code below to populate the loans table.

conn = sqlite3.connect("lending-club-loan-results.db")
c = conn.cursor()

tables = list(
    pd.read_sql_query("SELECT * FROM sqlite_master WHERE type='table';", con=conn)[
        "tbl_name"
    ]
)

if "loans" in tables:
    c.execute(f"DELETE FROM loans")
    conn.commit()

pd.read_csv(
    "https://github.com/bdi475/datasets/raw/main/lending-club-loan-results.csv.gz",
    compression="gzip",
).to_sql(name="loans", index=False, con=conn, if_exists="append")

conn.close()

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

# DO NOT CHANGE THE CODE IN THIS CELL
conn_checker = sqlite3.connect("lending-club-loan-results.db")
table_to_check = "loans"

# Check if table exists
user_tables = list(
    pd.read_sql_query(
        'SELECT * FROM sqlite_master WHERE type="table";', con=conn_checker
    )["tbl_name"]
)
tc.assertTrue(
    table_to_check in user_tables,
    f"{table_to_check} does not exist in your NWT.db file!",
)

conn_checker.close()

The table below describes the columns in the loans table.

FieldDescription
loan_amntlisted amount of the loan applied for by the borrower
int_rateinterest rate on the loan (between 0 and 1)
term_in_monthsnumber of payments on the loan - either 36 or 60 months
gradeassigned loan grade (A, B, C, D, E, F, G)
job_titlejob title supplied by the Borrower when applying for the loan
home_ownershiphome ownership status provided by the borrower during registration (RENT, OWN, MORTGAGE)
annual_incself-reported annual income provided by the borrower during registration
loan_statusresult of the loan (Fully Paid or Charged Off)
purposea category provided by the borrower for the loan request
did_default0 == paid in full, 1 == default (charged off)

▶️ Run the code below to display the first 10 rows of the loans table.

conn = sqlite3.connect("lending-club-loan-results.db")
display(pd.read_sql_query("SELECT * FROM loans LIMIT 10", con=conn))
conn.close()

🎯 Challenge 1: Find high profile defaults

👇 Tasks

  • ✔️ Write a query that:

    • selects all columns,

    • from the loans table,

    • where the job_title is "Attorney", annual_inc is greater than 300000, and did_default is 1

  • ✔️ Store your query to a new variable named query_high_profile_defaults.

  • ✔️ The order of rows does not matter.

🔑 Expected output

loan_amntint_rateterm_in_monthsgradejob_titlehome_ownershipannual_incloan_statuspurposedid_default
0150000.091636BAttorneyMORTGAGE309000Charged Offdebt_consolidation1
1350000.134436CAttorneyRENT340000Charged Offcredit_card1
2350000.154160DAttorneyMORTGAGE350000Charged Offcredit_card1
3350000.170936DAttorneyRENT450000Charged Offdebt_consolidation1
# YOUR CODE BEGINS

# YOUR CODE ENDS

conn = sqlite3.connect("lending-club-loan-results.db")
df_result = pd.read_sql_query(query_high_profile_defaults, con=conn)
display(df_result)
conn.close()

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-01"
_points = 6

conn = sqlite3.connect("lending-club-loan-results.db")
decoded_query = base64.b64decode(
    b"ClNFTEVDVCAqCkZST00gbG9hbnMKV0hFUkUgKGpvYl90aXRsZSA9ICJB\
dHRvcm5leSIpIEFORCAoYW5udWFsX2luYyA+IDMwMDAwMCkgQU5EIChkaWRfZGVmYXVsdCA9IDEpOwo="
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
pd.testing.assert_frame_equal(
    df_result.sort_values(df_result.columns.tolist()).reset_index(drop=True),
    df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True),
)
conn.close()

🎯 Challenge 2: Stats by loan grade

👇 Tasks

  • ✔️ Write a query that summarizes the average interest rate and the default rate for each loan grade.

  • ✔️ Use the following column names:

    • grade: Grade of each loan (e.g., “A”, “B”, ..., “G”)

    • avg_int_rate: Average interest rate for each loan grade

    • default_rate: Default rate for each loan grade

      • This is the average of did_default column.

  • ✔️ Sort the result by grade in ascending order.

  • ✔️ Store your query to a new variable named query_stats_by_grade.

🔑 Expected output

gradeavg_int_ratedefault_rate
0A0.07042550.0628832
1B0.1045370.139501
2C0.1384260.233738
3D0.1764420.31796
4E0.2115390.418802
5F0.2537980.472141
6G0.2826570.512246
# YOUR CODE BEGINS

# YOUR CODE ENDS

conn = sqlite3.connect("lending-club-loan-results.db")
df_result = pd.read_sql_query(query_stats_by_grade, con=conn)
display(df_result)
conn.close()

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-02"
_points = 7

conn = sqlite3.connect("lending-club-loan-results.db")
decoded_query = base64.b64decode(
    b"ClNFTEVDVAogICAgZ3JhZGUsCiAgICBBVkcoaW\
50X3JhdGUpIEFTIGF2Z19pbnRfcmF0ZSwKICAgIEFWRyhkaWRfZGVmYXVsdCkgQVMgZGVmYXVsdF9yYXRlCkZST00gbG9hbnMKR1JPVVAgQlkgZ3JhZGUKT1JERVIgQlkgZ3JhZGU7Cg=="
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
pd.testing.assert_frame_equal(
    df_result.reset_index(drop=True), df_check.reset_index(drop=True)
)
conn.close()

🎯 Challenge 3: Stats by loan purpose

👇 Tasks

  • ✔️ Write a query that summarizes the number of loans, average interest rate, and the default rate for each loan purpose.

  • ✔️ Use the following column names:

    • purpose: Purpose of each loan (e.g., “debt_consolidation”, “credit_card”)

    • num_loans: Number of loans for each loan purpose

    • avg_int_rate: Average interest rate for each loan purpose

    • default_rate: Default rate for each loan purpose

      • This is the average of did_default column.

  • ✔️ Sort the result by num_loans in descending order.

  • ✔️ Store your query to a new variable named query_stats_by_purpose.

🔑 Expected output

purposenum_loansavg_int_ratedefault_rate
0debt_consolidation1183820.1337370.217685
1credit_card457180.1140850.173127
2home_improvement141920.127650.184188
3other108290.1437140.215071
4major_purchase46240.1295190.222535
5small_business32940.1629640.319065
6medical24190.1395620.222819
7car20590.1200860.154444
8moving13020.1522510.24424
9vacation12580.1349180.197933
10house10420.1566190.255278
# YOUR CODE BEGINS

# YOUR CODE ENDS

conn = sqlite3.connect("lending-club-loan-results.db")
df_result = pd.read_sql_query(query_stats_by_purpose, con=conn)
display(df_result)
conn.close()

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-03"
_points = 7

conn = sqlite3.connect("lending-club-loan-results.db")
decoded_query = base64.b64decode(
    b"ClNFTEVDVAogICAgcHVycG9zZSwKICA\
    gIENPVU5UKCopIEFTIG51bV9sb2FucywKICAgIEFWRyhpbnRfcmF0ZSkgQVMgYXZnX\
    2ludF9yYXRlLAogICAgQVZHKGRpZF9kZWZhdWx0KSBBUyBkZWZhdWx0X3JhdGUKRlJ\
    PTSBsb2FucwpHUk9VUCBCWSBwdXJwb3NlCk9SREVSIEJZIG51bV9sb2FucyBERVNDOwo="
).decode()
df_check = pd.read_sql_query(decoded_query, con=conn)
pd.testing.assert_frame_equal(
    df_result.reset_index(drop=True), df_check.reset_index(drop=True)
)
conn.close()

🍸 Submitting your notebook

There is one final step before exporting the notebook as an .ipynb file for submission. Restart your runtime (kernal) and run all cells from the beginning to ensure that your notebook does not contain any error.

  • Go to the “Runtime” (“Kernel” if you’re on Jupyter Lab) menu on top.

  • Select “Restart and run all”.

  • Make sure no code cell throws an error.

  • Failing to pass this step may result in a significant loss of points since the autograder will fail to run.

image.png
print("🎯 Restart and run all successful")