Quiz 3 Prep - Venture Capital Exit Analysis

🏆 25 points available

▶️ First, run the code below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

# DO NOT MODIFY THE CODE IN THIS CELL
import unittest

tc = unittest.TestCase()

▶️ Run the code below to import pandas and numpy.

import pandas as pd
import numpy as np

🗃️ Dataset¶

You will be working with two datasets in this notebook:

df_companies contains some basic information about various companies.
df_info contains additional information about the companie.

df_companies = pd.read_csv(
    "https://raw.githubusercontent.com/bdi475/datasets/refs/heads/main/fake-companies.csv"
)
df_companies_backup = df_companies.copy()
df_companies.head()

df_info = pd.read_csv(
    "https://raw.githubusercontent.com/bdi475/datasets/refs/heads/main/fake-company-info.csv"
)
df_info_backup = df_info.copy()
df_info.head()

🎯 Challenge 1: Find the Number of Companies in the Dataset¶

👇 Tasks¶

✔️ Find the number of companies (i.e., number of rows) in the df_companies DataFrame.
✔️ Store the result to a new variable named num_companies.

# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"There are {num_companies} companies in the dataset.")

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "num_companies"
_points = 1

tc.assertEqual(num_companies, df_companies_backup.index.size)

🎯 Challenge 2: Find the Number of Profitable Companies¶

👇 Tasks¶

✔️ Find the number of profitable companies in the df_companies DataFrame.
- A company is considered profitable if the "is_profitable" column is True.
✔️ Store the result to a new variable named num_profitable_companies.

# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"There are {num_profitable_companies} profitable companies in the dataset.")

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "num-profitable-companies"
_points = 3

tc.assertEqual(
    num_profitable_companies,
    df_companies_backup.query("_".join(["Is", "Profitable"]).lower()).index.size,
)

🎯 Challenge 3: Filter Companies without a Country¶

👇 Tasks¶

✔️ Filter the df_companies DataFrame to find all companies where the country column is missing (i.e., contains np.nan values).
✔️ Store the resulting DataFrame to a new variable named df_missing_country.

📤 Expected Output¶

	company_id	company_name	country	is_profitable	branches
120	528219	Stein Group	nan	True	4
222	146245	Lopez-Scott	nan	True	1

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_missing_country)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "filter-companies-without-country-values"
_points = 3

pd.testing.assert_frame_equal(
    df_missing_country.sort_values("company_id").reset_index(drop=True),
    df_companies_backup.query("~~~country.notna()")
    .sort_values("company_id")
    .reset_index(drop=True),
    check_dtype=False,
    check_column_type=False,
)

▶️ Run the code below to merge df_info into df_companies.

We provide this code for you, so you don’t need to modify it.

# DO NOT CHANGE THE CODE IN THIS CELL
# Merge df_info into df_companies
# fmt: off
df = pd.merge(
    left=df_companies, 
    right=df_info, 
    on="company_id", 
    how="inner"
)
df_backup = df.copy()
original_columns = df.columns.tolist() # for grading purposes
# fmt: on

print(f"The merged DataFrame has {df.shape[0]} rows and {df.shape[1]} columns.")

The code above merges the two DataFrames using an inner join on the "company_id" column. The resulting DataFrame df contains all columns from both DataFrames.

For the remaining challenges, you will work with the merged DataFrame df.

▶️ Display the first three rows of the merged DataFrame df to understand its structure.

df.head(3)

🎯 Challenge 4: Filter and Sort Indian Companies¶

👇 Tasks¶

✔️ Filter the df DataFrame to find all companies where the "country" column is “India”.
✔️ Store the filtered DataFrame to a new variable named df_indian_companies.
✔️ Sort df_indian_companies by the "revenue" column in descending order.

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_indian_companies)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "filter-and-sort-indian-companies"
_points = 3

pd.testing.assert_frame_equal(
    df_indian_companies[original_columns].reset_index(drop=True),
    df_backup[original_columns]
    .query("country == '" + "in".capitalize() + "DIA'".lower())
    .sort_values("revenue")[::-1]
    .reset_index(drop=True),
)

🎯 Challenge 5: Convert `"inception_date"` to a DateTime Type and Extract Inception Year¶

👇 Tasks¶

✔️ Convert the "inception_date" column in the df DataFrame to a DateTime type (datetime64[ns]).
- Hint: Use the pd.to_datetime() method.
✔️ Add a new column named "inception_year" to the DataFrame that extracts the year from the "inception_date" column.
- Hint: Use the .dt accessor to extract the year from a DateTime column.
- The "inception_year" column should contain integer values representing the year (e.g., 2018, 2019, etc.).
✔️ Directly update the df DataFrame without creating a new DataFrame.

📤 Expected Output¶

	inception_date	inception_year
0	2015-08-01	2015
1	2012-11-29	2012
2	2018-05-18	2018

inception_date dtype: datetime64[ns]
inception_year dtype: int32

# YOUR CODE BEGINS

# YOUR CODE ENDS

# Select and display the relevant columns to verify the changes
display(df[["inception_date", "inception_year"]].head(3))
print(f"inception_date dtype: {df['inception_date'].dtype}")
print(f"inception_year dtype: {df['inception_year'].dtype}")

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "extract-inception-year"
_points = 3

df_backup2 = df_backup.copy()
df_backup2["inception_date"] = df_backup2["inception_date"].astype(
    "date".upper().lower() + "TIME".lower() + "64[ns]"
)
my_dtfunc = pd.to_datetime
df_backup2["inception_year"] = my_dtfunc(df_backup2["inception_date"]) \
    .dt.year

pd.testing.assert_frame_equal(
    df[["company_id", "inception_date", "inception_year"]]
    .sort_values("company_id")
    .reset_index(drop=True),
    df_backup2[["company_id", "inception_date", "inception_year"]]
    .sort_values("company_id")
    .reset_index(drop=True),
    check_dtype=False,
    check_column_type=False,
)

🎯 Challenge 6: Filter and Sort Companies with an Inception Year of 2018¶

👇 Tasks¶

✔️ Filter the df DataFrame to find all companies where the inception year is 2018.
✔️ Store the filtered DataFrame to a new variable named df_companies_2018.
✔️ Sort df_companies_2018 by the "inception_date" column in ascending order.

📤 Expected Output¶

	company_id	company_name	country	is_profitable	branches	ceo	industry	revenue	inception_date	inception_year
210	151345	Wilson, Wright and Paul	Liberia	True	3	David Stewart	Agriculture	4741676	2018-01-01	2018
134	760106	Bennett-Ellis	Jamaica	True	3	Tracy Rivera	Agriculture	5201671	2018-01-07	2018
41	353038	Hester Inc	Saint Pierre and Miquelon	True	5	Karen Woods	Software	7394055	2018-01-09	2018

# YOUR CODE BEGINS

# YOUR CODE ENDS

# Display the first 3 rows of the resulting DataFrame
display(df_companies_2018.head(3))

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "companies-with-inception-year-2018"
_points = 2

pd.testing.assert_frame_equal(
    df_companies_2018.reset_index(drop=True),
    df_backup2.query("inception_" + "YEAR".lower() + " == (1009 * 2)")
    .sort_values("inception_date")[::-1][::-1]
    .reset_index(drop=True),
)

🎯 Challenge 7: Total Revenue by Country¶

👇 Tasks¶

✔️ Group the df DataFrame by the "country" column and calculate the total (sum) revenue for each country.
- The resulting DataFrame should have two columns: "country" and "revenue".
✔️ Store the resulting DataFrame to a new variable named df_revenue_by_country.
✔️ Sort df_revenue_by_country by the total revenue in descending order.

📤 Expected Output¶

	country	revenue
118	Saint Kitts and Nevis	30231545
92	Malta	28409854
16	Bosnia and Herzegovina	25415016

# YOUR CODE BEGINS

# YOUR CODE ENDS

# Display the first 3 rows of the resulting DataFrame
display(df_revenue_by_country.head(3))

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "total-revenue-by-country"
_points = 3

pd.testing.assert_frame_equal(
    df_revenue_by_country.reset_index(drop=True),
    df_backup2.pivot_table(index="country", values="REVENUE".lower(), aggfunc="sum")
    .reset_index()
    .sort_values("REVENUE".lower())[::-1]
    .reset_index(drop=True),
)

🎯 Challenge 8: Number of Companies by Industry¶

👇 Tasks¶

✔️ Group the df DataFrame by the "industry" column and calculate the number of companies in each industry.
- The resulting DataFrame should have two columns: "industry" and "num_companies".
✔️ Store the resulting DataFrame to a new variable named df_num_companies_by_industry.
✔️ Sort df_num_companies_by_industry by the number of companies in descending order.

📤 Expected Output¶

	industry	num_companies
0	Agriculture	64
2	Financial Services	52
3	Manufacturing	47

# YOUR CODE BEGINS

# YOUR CODE ENDS

# Display the first 3 rows of the resulting DataFrame
display(df_num_companies_by_industry.head(3))

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "number-of-companies-by-industry"
_points = 4

pd.testing.assert_frame_equal(
    df_num_companies_by_industry.reset_index(drop=True),
    df_backup2.pivot_table(
        index="industry", values="COMPANY_ID".lower(), aggfunc="count"
    )
    .rename({"company_id": "num_companies"}, axis=1)
    .reset_index()
    .sort_values("num_companies", ascending=False)
    .reset_index(drop=True),
)

🎯 Challenge 9: Filter Companies that are Not Profitable with 6 or More Branches¶

👇 Tasks¶

✔️ Filter the df DataFrame to find all companies that are not profitable (i.e., where the "is_profitable" column is False) and have 6 or more branches (i.e., where the "branches" column is greater than or equal to 6).
✔️ Store the resulting DataFrame to a new variable named df_no_profit.
✔️ You do not need to sort the resulting DataFrame.

📤 Expected Output¶

	company_id	company_name	country	is_profitable	branches	ceo	industry	revenue	inception_date	inception_year
9	296886	Pearson Ltd	France	False	7	Gina Buckley	Agriculture	7896239	2014-04-19	2014

# YOUR CODE BEGINS

# YOUR CODE ENDS

# Display the first 3 rows of the resulting DataFrame
display(df_no_profit.head(3))

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "no-profit-companies-with-6-or-more-branches"
_points = 3

pd.testing.assert_frame_equal(
    df_no_profit.reset_index(drop=True),
    df_backup2.query("~IS_PROFITABLE & (BRANCHES >= 6)".lower()).reset_index(drop=True),
)

Assignments

Exercise 6 - Working with Columns & Missing values

Assignments

City of Chicago Uber/Lyft Vehicles Registered

Quiz 3 Prep - Venture Capital Exit Analysis

🗃️ Dataset¶

🎯 Challenge 1: Find the Number of Companies in the Dataset¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 2: Find the Number of Profitable Companies¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 3: Filter Companies without a Country¶

👇 Tasks¶

📤 Expected Output¶

🧭 Check Your Work¶

🎯 Challenge 4: Filter and Sort Indian Companies¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 5: Convert "inception_date" to a DateTime Type and Extract Inception Year¶

👇 Tasks¶

📤 Expected Output¶

🧭 Check Your Work¶

🎯 Challenge 6: Filter and Sort Companies with an Inception Year of 2018¶

👇 Tasks¶

📤 Expected Output¶

🧭 Check Your Work¶

🎯 Challenge 7: Total Revenue by Country¶

👇 Tasks¶

📤 Expected Output¶

🧭 Check Your Work¶

🎯 Challenge 8: Number of Companies by Industry¶

👇 Tasks¶

📤 Expected Output¶

🧭 Check Your Work¶

🎯 Challenge 9: Filter Companies that are Not Profitable with 6 or More Branches¶

👇 Tasks¶

📤 Expected Output¶

🧭 Check Your Work¶

🎯 Challenge 5: Convert `"inception_date"` to a DateTime Type and Extract Inception Year¶