Skip to article frontmatterSkip to article content

Exercise 6 - Working with Columns & Missing values

  • ๐Ÿ† 20 points available
  • โœ๏ธ Last updated on 09/23/2025

โ–ถ๏ธ First, run the code cell below to import unittest, a module used for ๐Ÿงญ Check Your Work sections and the autograder.

# DO NOT MODIFY THE CODE IN THIS CELL
import unittest

tc = unittest.TestCase()

๐ŸŽฏ Part 1: Import Pandas and NumPyยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Import the following Python packages.
    1. pandas: Use alias pd.
    2. numpy: Use alias np.
# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-01"
_points = 2

tc.assertTrue(
    "pd" in globals(), "Check whether you have correctly import Pandas with an alias."
)
tc.assertTrue(
    "np" in globals(), "Check whether you have correctly import NumPy with an alias."
)

๐Ÿ“Œ Load dataยถ

For the remainder of this exercise, youโ€™ll be working with a small DataFrame named df_emp.

โ–ถ๏ธ Run the code cell below to create df_emp.

# DO NOT CHANGE THE CODE IN THIS CELL
df_emp = pd.DataFrame(
    {
        "emp_id": [30, 40, 10, 20],
        "name": ["Nupur", "Matt", "Ethan", "John"],
        "dept": ["Sales", "Marketing", "Sales", "Marketing"],
        "office_phone": ["(217)123-4500", np.nan, np.nan, "(217)987-6543"],
        "start_date": ["2017-05-01", "2018-02-01", "2020-08-01", "2019-12-01"],
        "salary": [202000, 185000, 240000, 160500],
    }
)

# Used for intermediate checks
df_emp_backup = df_emp.copy()

df_emp

๐ŸŽฏ Part 2: Find the number of rows and columnsยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Store the number of rows in df_emp to a new variable named num_rows.
  • โœ”๏ธ Store the number of columns in df_emp to a new variable named num_cols.
  • โœ”๏ธ Use .shape, not len().
# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-02"
_points = 2

tc.assertEqual(
    num_rows,
    len(df_emp_backup.index),
    f"Number of rows should be {len(df_emp_backup.index)}",
)
tc.assertEqual(
    num_cols,
    len(df_emp_backup.columns),
    f"Number of columns should be {len(df_emp_backup.columns)}",
)

๐ŸŽฏ Part 3: Find all rows with non-missing phone numbersยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_emp, find rows where office_phone contains a non-missing value.
  • โœ”๏ธ Store the filtered rows to df_with_phones.
  • โœ”๏ธ df_emp should remain unaltered.

๐Ÿ”‘ Expected Output of df_with_phonesยถ

emp_idnamedeptoffice_phonestart_datesalary
030NupurSales(217)123-45002017-05-01202000
320JohnMarketing(217)987-65432019-12-01160500
# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_with_phones)

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-03"
_points = 3

pd.testing.assert_frame_equal(
    df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)

# Check result
pd.testing.assert_frame_equal(
    df_with_phones.sort_values(df_with_phones.columns.tolist()).reset_index(drop=True),
    df_emp_backup.query(f"office_phone == office_phone")
    .sort_values(df_emp_backup.columns.tolist())
    .reset_index(drop=True),
)

๐ŸŽฏ Part 4: Sort by department and salaryยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Sort df_emp by dept in ascending order and then by salary in ascending order.
    • Employees within a same department must be sorted by salary in ascending order.
    • Store the result to a new variable named df_by_dept_salary.
  • โœ”๏ธ df_emp should remain unaltered after your code.

โ–ถ๏ธ Run the code cell below to reset your df_emp.

๐Ÿ”‘ Expected Output of df_by_dept_salaryยถ

emp_idnamedeptoffice_phonestart_datesalary
320JohnMarketing(217)987-65432019-12-01160500
140MattMarketingNaN2018-02-01185000
030NupurSales(217)123-45002017-05-01202000
210EthanSalesNaN2020-08-01240000
# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_by_dept_salary)

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-04"
_points = 3

df_SOL = df_emp_backup.sort_values(
    ["salary", "dept"][::-1], ascending=[not x for x in [False, False]]
)

pd.testing.assert_frame_equal(
    df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)

# check result
pd.testing.assert_frame_equal(
    df_by_dept_salary.reset_index(drop=True), df_SOL.reset_index(drop=True)
)

๐ŸŽฏ Part 5: Rename columnsยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Rename the following two columns in df_emp.
    1. name to first_name
    2. salary to base_salary
  • โœ”๏ธ Store the result to a new variable named df_renamed.
  • โœ”๏ธ Your original DataFrame (df_emp) should remain unaltered.

๐Ÿš€ Hintsยถ

Use the following code to rename columns out-of-place.

renamed_dataframe = my_dataframe.rename(columns={
    "col1_before": "col1_after",
    "col2_before": "col2_after"
})
# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_renamed)

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-05"
_points = 3

tc.assertEqual(
    df_emp.columns.tolist(),
    df_emp_backup.columns.tolist(),
    "Did you rename the column in-place? The original DataFrame should not be modified.",
)
tc.assertEqual(
    df_renamed.columns.tolist(),
    ["emp_id", "first_name", "dept", "office_phone", "start_date", "base_salary"],
)

๐ŸŽฏ Part 6: Drop columnsยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Drop emp_id and start_date columns from df_emp.
  • โœ”๏ธ Store the result to a new variable named df_dropped.
  • โœ”๏ธ Your df_emp should remain unaltered.

๐Ÿš€ Hintsยถ

Use the following code as a reference.

dropped_dataframe = my_dataframe.drop(columns=["my_column1", "my_column2"])
# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

df_dropped

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-06"
_points = 3

pd.testing.assert_frame_equal(
    df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)

# Check result
tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist())
tc.assertEqual(df_dropped.columns.tolist(), ["name", "dept", "office_phone", "salary"])

๐ŸŽฏ Part 7: Add new columnยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Assume all four employees have received a huge bonus of $20,000.
  • โœ”๏ธ Append a new column named salary_with_bonus that shows the salaries of employees with the bonus.
  • โœ”๏ธ Directly update the df_emp.

๐Ÿš€ Hintsยถ

Use the following code as a reference.

my_dataframe["salary_with_bonus"] = my_dataframe["salary"] + 20000
# Reset df_emp
df_emp = df_emp_backup.copy()

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_emp)

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.
  • โŒ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-07"
_points = 4

df_SOL = df_emp_backup.copy()
df_SOL["salary_with_bonus"] = df_SOL["salary"] + 1000 * 20

pd.testing.assert_frame_equal(
    df_emp.sort_values(df_emp.columns.to_list()).reset_index(drop=True),
    df_SOL.sort_values(df_SOL.columns.to_list()).reset_index(drop=True),
)