Skip to article frontmatterSkip to article content

Exercise 5 - Pandas Filtering and Sorting

  • 🏆 20 points available
  • ✏️ Last updated on 9/13/2022

▶️ First, run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

# DO NOT MODIFY THE CODE IN THIS CELL
import unittest
tc = unittest.TestCase()

🎯 Challenge 1: Import Pandas and NumPy

👇 Tasks

  • ✔️ Import the following Python packages.
    1. pandas: Use alias pd.
    2. numpy: Use alias np.
# YOUR CODE BEGINS

# YOUR CODE ENDS

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'import-pandas-numpy'
_points = 2

tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")

🎯 Challenge 2: Create a Pandas Series

👇 Tasks

  • ✔️ Create a new Pandas Series named sample_series with the following four values: -20, -10, 10, 20

🚀 Hint

The code below creates a new Pandas Series with the values 1 and 2.

my_new_series = pd.Series([1, 2])
# YOUR CODE BEGINS

# YOUR CODE ENDS

print(sample_series)

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'create-a-pandas-series'
_points = 2

pd.testing.assert_series_equal(sample_series, pd.Series(x * 10 for x in [-2, -1, 1, 2]))

🎯 Challenge 3: Create a Pandas DataFrame

👇 Tasks

  • ✔️ You are given two lists - brands and rankings that contain the names of make-up products and the number of reviews on Sephora.com.
  • ✔️ Using the two lists, create a new Pandas DataFrame named df_brands that has the following two columns:
    1. brand: Names of the brands
    2. ranking: Ranking of the brands
  • ✔️ Note that the column names are singular.

🚀 Hint

The code below creates a new Pandas DataFrame from two series.

my_new_dataframe = pd.DataFrame({
    "column_one": my_series1,
    "column_two": my_series2
})

🔑 Expected Output

brandranking
0Apple1
1Amazon2
2Google3
brands = ["Apple", "Amazon", "Google"]
rankings = [1, 2, 3]

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_brands)

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'create-a-pandas-dataframe'
_points = 2

pd.testing.assert_frame_equal(
    df_brands.reset_index(drop=True),
    pd.DataFrame(
        {"brand": {0: "Apple", 1: "Amazon", 2: "Google"},
 "ranking": {0: 1, 1: 2, 2: 3}})
)

Exercises using the Maven Toys Dataset

For the remainder of this exercise, you’ll be working with toy products data.

Data Source: Maven Analytics Datasets

📌 Load data

▶️ Run the code cell below to create a new DataFrame named df_products.

df_products = pd.read_csv("https://raw.githubusercontent.com/bdi475/datasets/main/maven-toys-data/products.csv")

# Used to keep a clean copy
df_products_copy = df_products.copy()

# Display the first 5 rows
df_products.head()

The table below describes the columns in df_products.

FieldDescription
Product_IDProduct ID
Product_NameProduct name
Product_CategoryProduct Category
Product_CostProduct cost (USD)
Product_PriceProduct retail price (USD)

🎯 Challenge 4: Find the number of rows and columns

👇 Tasks

  • ✔️ Store the number of rows in df_products to a new variable named num_rows.
  • ✔️ Store the number of columns in df_products to a new variable named num_cols.
  • ✔️ Use .shape, not len().
# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'find-num-rows-and-cols'
_points = 2

tc.assertEqual(num_rows, len(df_products_copy.index), f"Number of rows should be {len(df_products_copy.index)}")
tc.assertEqual(num_cols, len(df_products_copy.columns), f"Number of columns should be {len(df_products_copy.columns)}")

🎯 Challenge 5: Find all games

👇 Tasks

  • ✔️ Using df_products, find all products in the "Games" category (df_products["Product_Category"] == "Games").
  • ✔️ Store the result to a new variable named df_games.
  • ✔️ df_products should remain unaltered.

🔑 Expected Output of df_games

Product_IDProduct_NameProduct_CategoryProduct_CostProduct_Price
34Chutes & LaddersGames9.9912.99
45Classic DominoesGames7.999.99
78Deck Of CardsGames3.996.99
1314Glass MarblesGames5.9910.99
1516JengaGames2.999.99
2122MonopolyGames13.9919.99
2930Rubik’s CubeGames17.9919.99
3435Uno Card GameGames3.997.99
# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_games)

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'find-all-games'
_points = 3

import base64
q = b'UHJvZHVjdF9DYXRlZ29yeSA9PSAnR2FtZXMn'

pd.testing.assert_frame_equal(
    df_games.sort_values(df_games.columns.to_list()).reset_index(drop=True),
    df_products_copy.query(base64.b64decode(q).decode('ascii')).sort_values(df_products_copy.columns.to_list()).reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

🎯 Challenge 6: Find electronics with a product cost over $10

👇 Tasks

  • ✔️ Using df_products, find all products that matches the following two conditions:
    1. in the "Electronics" category (df_products["Product_Category"] == "Electronics")
    2. and the product cost is over 10 dollars (df_products["Product_Cost"] > 10).
  • ✔️ Store the result to a new variable named df_electronics_over_10.
  • ✔️ df_products should remain unaltered.

🔑 Expected Output of df_electronics_over_10

Product_IDProduct_NameProduct_CategoryProduct_CostProduct_Price
1213Gamer HeadphonesElectronics14.9920.99
3334Toy RobotElectronics20.9925.99
# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_electronics_over_10)

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'find-electronics-over-10-dollars'
_points = 4

import base64

q = b'KFByb2R1Y3RfQ2F0ZWdvcnkgPT0gJ0VsZWN0cm9uaWNzJykgJiAoUHJvZHVjdF9Db3N0ID4gMTAp'

pd.testing.assert_frame_equal(
    df_electronics_over_10.sort_values(df_electronics_over_10.columns.to_list()).reset_index(drop=True),
    df_products_copy.query(base64.b64decode(q).decode('ascii')).sort_values(df_products_copy.columns.to_list()).reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

🎯 Challenge 7: Sort by Price in Descending Order

👇 Tasks

  • ✔️ Sort df_products by price (Product_Price column) in descending order.
  • ✔️ Store the sorted result to a new variable named df_sorted_by_price.
  • ✔️ df_products should remain unaltered.
# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_sorted_by_price)

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'sort-by-price-desc'
_points = 2

pd.testing.assert_series_equal(
    df_sorted_by_price["Product_Price"].reset_index(drop=True),
    df_products_copy.sort_values("Product_Price").iloc[::-1]["Product_Price"].reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

🎯 Challenge 8: Sort by Product Category and Product Name

👇 Tasks

  • ✔️ Sort df_products by product category in ascending order and then by product price in descending order for products within each category.
  • ✔️ Store the sorted result to a new variable named df_sorted_by_category_price.
  • ✔️ If two rows have the same product category and the same price, the order of those two rows doesn’t matter.
  • ✔️ df_products should remain unaltered.

🔑 Sample Output of df_sorted_by_category_price

Product_IDProduct_NameProduct_CategoryProduct_CostProduct_Price
2526PlayDoh PlaysetArt & Crafts20.9924.99
1011Etch A SketchArt & Crafts10.9920.99
1617Kids Makeup KitArt & Crafts13.9919.99
1819Magic SandArt & Crafts13.9915.99
2728PlayfoamArt & Crafts3.9910.99
2627PlayDoh ToolkitArt & Crafts3.994.99
23Barrel O’ SlimeArt & Crafts1.993.99
2425PlayDoh CanArt & Crafts1.992.99
3334Toy RobotElectronics20.9925.99
1213Gamer HeadphonesElectronics14.9920.99
56ColorbudsElectronics6.9914.99
2122MonopolyGames13.9919.99
2930Rubik’s CubeGames17.9919.99
34Chutes & LaddersGames9.9912.99
1314Glass MarblesGames5.9910.99
45Classic DominoesGames7.999.99
1516JengaGames2.999.99
3435Uno Card GameGames3.997.99
78Deck Of CardsGames3.996.99
1920Mini Basketball HoopSports & Outdoors8.9924.99
2324Nerf GunSports & Outdoors14.9919.99
67Dart GunSports & Outdoors11.9915.99
3132Supersoaker Water GunSports & Outdoors11.9914.99
1112Foam Disk LauncherSports & Outdoors8.9911.99
2021Mini Ping Pong SetSports & Outdoors6.999.99
3031Splash BallsSports & Outdoors7.998.99
1718Lego BricksToys34.9939.99
2829Plush PonyToys8.9919.99
01Action FigureToys9.9915.99
910Dinosaur FiguresToys10.9914.99
12Animal FiguresToys9.9912.99
3233Teddy BearToys10.9912.99
89Dino EggToys9.9910.99
2223Mr. PotatoheadToys4.999.99
1415Hot Wheels 5-PackToys3.995.99
# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_sorted_by_category_price)

🧭 Check Your Work

Run the code cell below to test your solution.

  • ✔️ If the code cell runs without errors, you’re good to move on.
  • ❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'sort-by-cat-asc-name-desc'
_points = 3

sample_sorted = df_products_copy.sort_values(["Product_Price", "Product_Category"][::-1], ascending=[False, True]).iloc[::-1]

pd.testing.assert_series_equal(
    df_sorted_by_category_price["Product_Category"].reset_index(drop=True),
    sample_sorted["Product_Category"].reset_index(drop=True)
)
pd.testing.assert_series_equal(
    df_sorted_by_category_price["Product_Price"].reset_index(drop=True),
    sample_sorted["Product_Price"].reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)