- ๐ 20 points available
- โ๏ธ Last updated on 09/23/2025
โถ๏ธ First, run the code cell below to import unittest, a module used for ๐งญ Check Your Work sections and the autograder.
# DO NOT MODIFY THE CODE IN THIS CELL
import unittest
tc = unittest.TestCase()๐ฏ Part 1: Import Pandas and NumPyยถ
๐ Tasksยถ
- โ๏ธ Import the following Python packages.
pandas: Use aliaspd.numpy: Use aliasnp.
# YOUR CODE BEGINS
# YOUR CODE ENDS๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-01"
_points = 2
tc.assertTrue(
"pd" in globals(), "Check whether you have correctly import Pandas with an alias."
)
tc.assertTrue(
"np" in globals(), "Check whether you have correctly import NumPy with an alias."
)๐ Load dataยถ
For the remainder of this exercise, youโll be working with a small DataFrame named df_emp.
โถ๏ธ Run the code cell below to create df_emp.
# DO NOT CHANGE THE CODE IN THIS CELL
df_emp = pd.DataFrame(
{
"emp_id": [30, 40, 10, 20],
"name": ["Nupur", "Matt", "Ethan", "John"],
"dept": ["Sales", "Marketing", "Sales", "Marketing"],
"office_phone": ["(217)123-4500", np.nan, np.nan, "(217)987-6543"],
"start_date": ["2017-05-01", "2018-02-01", "2020-08-01", "2019-12-01"],
"salary": [202000, 185000, 240000, 160500],
}
)
# Used for intermediate checks
df_emp_backup = df_emp.copy()
df_emp๐ฏ Part 2: Find the number of rows and columnsยถ
๐ Tasksยถ
- โ๏ธ Store the number of rows in
df_empto a new variable namednum_rows. - โ๏ธ Store the number of columns in
df_empto a new variable namednum_cols. - โ๏ธ Use
.shape, notlen().
# YOUR CODE BEGINS
# YOUR CODE ENDS
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-02"
_points = 2
tc.assertEqual(
num_rows,
len(df_emp_backup.index),
f"Number of rows should be {len(df_emp_backup.index)}",
)
tc.assertEqual(
num_cols,
len(df_emp_backup.columns),
f"Number of columns should be {len(df_emp_backup.columns)}",
)๐ฏ Part 3: Find all rows with non-missing phone numbersยถ
๐ Tasksยถ
- โ๏ธ Using
df_emp, find rows whereoffice_phonecontains a non-missing value. - โ๏ธ Store the filtered rows to
df_with_phones. - โ๏ธ
df_empshould remain unaltered.
๐ Expected Output of df_with_phonesยถ
| emp_id | name | dept | office_phone | start_date | salary | |
|---|---|---|---|---|---|---|
| 0 | 30 | Nupur | Sales | (217)123-4500 | 2017-05-01 | 202000 |
| 3 | 20 | John | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
# Reset df_emp
df_emp = df_emp_backup.copy()
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_with_phones)๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-03"
_points = 3
pd.testing.assert_frame_equal(
df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)
# Check result
pd.testing.assert_frame_equal(
df_with_phones.sort_values(df_with_phones.columns.tolist()).reset_index(drop=True),
df_emp_backup.query(f"office_phone == office_phone")
.sort_values(df_emp_backup.columns.tolist())
.reset_index(drop=True),
)๐ฏ Part 4: Sort by department and salaryยถ
๐ Tasksยถ
- โ๏ธ Sort
df_empbydeptin ascending order and then bysalaryin ascending order.- Employees within a same department must be sorted by
salaryin ascending order. - Store the result to a new variable named
df_by_dept_salary.
- Employees within a same department must be sorted by
- โ๏ธ
df_empshould remain unaltered after your code.
โถ๏ธ Run the code cell below to reset your df_emp.
๐ Expected Output of df_by_dept_salaryยถ
| emp_id | name | dept | office_phone | start_date | salary | |
|---|---|---|---|---|---|---|
| 3 | 20 | John | Marketing | (217)987-6543 | 2019-12-01 | 160500 |
| 1 | 40 | Matt | Marketing | NaN | 2018-02-01 | 185000 |
| 0 | 30 | Nupur | Sales | (217)123-4500 | 2017-05-01 | 202000 |
| 2 | 10 | Ethan | Sales | NaN | 2020-08-01 | 240000 |
# Reset df_emp
df_emp = df_emp_backup.copy()
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_by_dept_salary)๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-04"
_points = 3
df_SOL = df_emp_backup.sort_values(
["salary", "dept"][::-1], ascending=[not x for x in [False, False]]
)
pd.testing.assert_frame_equal(
df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)
# check result
pd.testing.assert_frame_equal(
df_by_dept_salary.reset_index(drop=True), df_SOL.reset_index(drop=True)
)๐ฏ Part 5: Rename columnsยถ
๐ Tasksยถ
- โ๏ธ Rename the following two columns in
df_emp.nametofirst_namesalarytobase_salary
- โ๏ธ Store the result to a new variable named
df_renamed. - โ๏ธ Your original DataFrame (
df_emp) should remain unaltered.
๐ Hintsยถ
Use the following code to rename columns out-of-place.
renamed_dataframe = my_dataframe.rename(columns={
"col1_before": "col1_after",
"col2_before": "col2_after"
})# Reset df_emp
df_emp = df_emp_backup.copy()
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_renamed)๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-05"
_points = 3
tc.assertEqual(
df_emp.columns.tolist(),
df_emp_backup.columns.tolist(),
"Did you rename the column in-place? The original DataFrame should not be modified.",
)
tc.assertEqual(
df_renamed.columns.tolist(),
["emp_id", "first_name", "dept", "office_phone", "start_date", "base_salary"],
)๐ฏ Part 6: Drop columnsยถ
๐ Tasksยถ
- โ๏ธ Drop
emp_idandstart_datecolumns fromdf_emp. - โ๏ธ Store the result to a new variable named
df_dropped. - โ๏ธ Your
df_empshould remain unaltered.
๐ Hintsยถ
Use the following code as a reference.
dropped_dataframe = my_dataframe.drop(columns=["my_column1", "my_column2"])# Reset df_emp
df_emp = df_emp_backup.copy()
# YOUR CODE BEGINS
# YOUR CODE ENDS
df_dropped๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-06"
_points = 3
pd.testing.assert_frame_equal(
df_emp, df_emp_backup, "The original DataFrame should remain unaltered"
)
# Check result
tc.assertEqual(df_emp.columns.tolist(), df_emp_backup.columns.tolist())
tc.assertEqual(df_dropped.columns.tolist(), ["name", "dept", "office_phone", "salary"])๐ฏ Part 7: Add new columnยถ
๐ Tasksยถ
- โ๏ธ Assume all four employees have received a huge bonus of $20,000.
- โ๏ธ Append a new column named
salary_with_bonusthat shows the salaries of employees with the bonus. - โ๏ธ Directly update the
df_emp.
๐ Hintsยถ
Use the following code as a reference.
my_dataframe["salary_with_bonus"] = my_dataframe["salary"] + 20000# Reset df_emp
df_emp = df_emp_backup.copy()
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_emp)๐งญ Check Your Workยถ
Run the code cell below to test your solution.
- โ๏ธ If the code cell runs without errors, youโre good to move on.
- โ If the code cell produces an error, review your code and fix any mistakes.
_test_case = "part-07"
_points = 4
df_SOL = df_emp_backup.copy()
df_SOL["salary_with_bonus"] = df_SOL["salary"] + 1000 * 20
pd.testing.assert_frame_equal(
df_emp.sort_values(df_emp.columns.to_list()).reset_index(drop=True),
df_SOL.sort_values(df_SOL.columns.to_list()).reset_index(drop=True),
)