Exercise 6 - Working with Columns & Missing values

🏆 20 points available
✏️ Last updated on 09/23/2025

▶️ First, run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

# DO NOT MODIFY THE CODE IN THIS CELL
import unittest

tc = unittest.TestCase()

🎯 Part 1: Import Pandas and NumPy¶

👇 Tasks¶

✔️ Import the following Python packages.
1. pandas: Use alias pd.
2. numpy: Use alias np.

# YOUR CODE BEGINS

# YOUR CODE ENDS

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-01"
_points = 2

tc.assertTrue(
    "pd" in globals(), "Check whether you have correctly import Pandas with an alias."
)
tc.assertTrue(
    "np" in globals(), "Check whether you have correctly import NumPy with an alias."
)

📌 Load data¶

For the remainder of this exercise, you’ll be working with a small DataFrame named df_emp.

▶️ Run the code cell below to create df_emp.

# DO NOT CHANGE THE CODE IN THIS CELL
df_emp = pd.DataFrame(
    {
        "emp_id": [30, 40, 10, 20],
        "name": ["Nupur", "Matt", "Ethan", "John"],
        "dept": ["Sales", "Marketing", "Sales", "Marketing"],
        "office_phone": ["(217)123-4500", np.nan, np.nan, "(217)987-6543"],
        "start_date": ["2017-05-01", "2018-02-01", "2020-08-01", "2019-12-01"],
        "salary": [202000, 185000, 240000, 160500],
    }
)

# Used for intermediate checks
df_emp_backup = df_emp.copy()

df_emp

🎯 Part 2: Find the number of rows and columns¶

👇 Tasks¶

✔️ Store the number of rows in df_emp to a new variable named num_rows.
✔️ Store the number of columns in df_emp to a new variable named num_cols.
✔️ Use .shape, not len().

# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-02"
_points = 2

tc.assertEqual(
    num_rows,
    len(df_emp_backup.index),
    f"Number of rows should be {len(df_emp_backup.index)}",
)
tc.assertEqual(
    num_cols,
    len(df_emp_backup.columns),
    f"Number of columns should be {len(df_emp_backup.columns)}",
)

🎯 Part 3: Find all rows with non-missing phone numbers¶

👇 Tasks¶

✔️ Using df_emp, find rows where office_phone contains a non-missing value.
✔️ Store the filtered rows to df_with_phones.
✔️ df_emp should remain unaltered.

🔑 Expected Output of `df_with_phones`¶

	emp_id	name	dept	office_phone	start_date	salary
0	30	Nupur	Sales	(217)123-4500	2017-05-01	202000
3	20	John	Marketing	(217)987-6543	2019-12-01	160500

# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_with_phones)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-03"
_points = 3

pd.testing.assert_frame_equal(
    df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)

# Check result
pd.testing.assert_frame_equal(
    df_with_phones.sort_values(df_with_phones.columns.tolist()).reset_index(drop=True),
    df_emp_backup.query(f"office_phone == office_phone")
    .sort_values(df_emp_backup.columns.tolist())
    .reset_index(drop=True),
)

🎯 Part 4: Sort by department and salary¶

👇 Tasks¶

✔️ Sort df_emp by dept in ascending order and then by salary in ascending order.
- Employees within a same department must be sorted by salary in ascending order.
- Store the result to a new variable named df_by_dept_salary.
✔️ df_emp should remain unaltered after your code.

▶️ Run the code cell below to reset your df_emp.

🔑 Expected Output of `df_by_dept_salary`¶

	emp_id	name	dept	office_phone	start_date	salary
3	20	John	Marketing	(217)987-6543	2019-12-01	160500
1	40	Matt	Marketing	NaN	2018-02-01	185000
0	30	Nupur	Sales	(217)123-4500	2017-05-01	202000
2	10	Ethan	Sales	NaN	2020-08-01	240000

# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_by_dept_salary)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-04"
_points = 3

df_SOL = df_emp_backup.sort_values(
    ["salary", "dept"][::-1], ascending=[not x for x in [False, False]]
)

pd.testing.assert_frame_equal(
    df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)

# check result
pd.testing.assert_frame_equal(
    df_by_dept_salary.reset_index(drop=True), df_SOL.reset_index(drop=True)
)

🎯 Part 5: Rename columns¶

👇 Tasks¶

✔️ Rename the following two columns in df_emp.
1. name to first_name
2. salary to base_salary
✔️ Store the result to a new variable named df_renamed.
✔️ Your original DataFrame (df_emp) should remain unaltered.

🚀 Hints¶

Use the following code to rename columns out-of-place.

renamed_dataframe = my_dataframe.rename(columns={
    "col1_before": "col1_after",
    "col2_before": "col2_after"
})

# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_renamed)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-05"
_points = 3

tc.assertEqual(
    df_emp.columns.tolist(),
    df_emp_backup.columns.tolist(),
    "Did you rename the column in-place? The original DataFrame should not be modified.",
)
tc.assertEqual(
    df_renamed.columns.tolist(),
    ["emp_id", "first_name", "dept", "office_phone", "start_date", "base_salary"],
)

🎯 Part 6: Drop columns¶

👇 Tasks¶

✔️ Drop emp_id and start_date columns from df_emp.
✔️ Store the result to a new variable named df_dropped.
✔️ Your df_emp should remain unaltered.

🚀 Hints¶

Use the following code as a reference.

dropped_dataframe = my_dataframe.drop(columns=["my_column1", "my_column2"])

# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

df_dropped

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-06"
_points = 3

pd.testing.assert_frame_equal(
    df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)

# Check result
tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist())
tc.assertEqual(df_dropped.columns.tolist(), ["name", "dept", "office_phone", "salary"])

🎯 Part 7: Add new column¶

👇 Tasks¶

✔️ Assume all four employees have received a huge bonus of $20,000.
✔️ Append a new column named salary_with_bonus that shows the salaries of employees with the bonus.
✔️ Directly update the df_emp.

🚀 Hints¶

Use the following code as a reference.

my_dataframe["salary_with_bonus"] = my_dataframe["salary"] + 20000

# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_emp)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-07"
_points = 4

df_SOL = df_emp_backup.copy()
df_SOL["salary_with_bonus"] = df_SOL["salary"] + 1000 * 20

pd.testing.assert_frame_equal(
    df_emp.sort_values(df_emp.columns.to_list()).reset_index(drop=True),
    df_SOL.sort_values(df_SOL.columns.to_list()).reset_index(drop=True),
)

Assignments

Problem Set - Python Applications

Assignments

Quiz 3 Prep - Venture Capital Exit Analysis

Exercise 6 - Working with Columns & Missing values

🎯 Part 1: Import Pandas and NumPy¶

👇 Tasks¶

🧭 Check Your Work¶

📌 Load data¶

🎯 Part 2: Find the number of rows and columns¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Part 3: Find all rows with non-missing phone numbers¶

👇 Tasks¶

🔑 Expected Output of df_with_phones¶

🧭 Check Your Work¶

🎯 Part 4: Sort by department and salary¶

👇 Tasks¶

🔑 Expected Output of df_by_dept_salary¶

🧭 Check Your Work¶

🎯 Part 5: Rename columns¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Part 6: Drop columns¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🎯 Part 7: Add new column¶

👇 Tasks¶

🚀 Hints¶

🧭 Check Your Work¶

🔑 Expected Output of `df_with_phones`¶

🔑 Expected Output of `df_by_dept_salary`¶