Exercise 5 - Pandas Filtering and Sorting

🏆 20 points available
✏️ Last updated on 9/13/2022

▶️ First, run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

# DO NOT MODIFY THE CODE IN THIS CELL
import unittest
tc = unittest.TestCase()

🎯 Challenge 1: Import Pandas and NumPy¶

👇 Tasks¶

✔️ Import the following Python packages.
1. pandas: Use alias pd.
2. numpy: Use alias np.

# YOUR CODE BEGINS

# YOUR CODE ENDS

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'import-pandas-numpy'
_points = 2

tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")

🎯 Challenge 2: Create a Pandas Series¶

👇 Tasks¶

✔️ Create a new Pandas Series named sample_series with the following four values: -20, -10, 10, 20

🚀 Hint¶

The code below creates a new Pandas Series with the values 1 and 2.

my_new_series = pd.Series([1, 2])

# YOUR CODE BEGINS

# YOUR CODE ENDS

print(sample_series)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'create-a-pandas-series'
_points = 2

pd.testing.assert_series_equal(sample_series, pd.Series(x * 10 for x in [-2, -1, 1, 2]))

🎯 Challenge 3: Create a Pandas DataFrame¶

👇 Tasks¶

✔️ You are given two lists - brands and rankings that contain the names of make-up products and the number of reviews on Sephora.com.
✔️ Using the two lists, create a new Pandas DataFrame named df_brands that has the following two columns:
1. brand: Names of the brands
2. ranking: Ranking of the brands
✔️ Note that the column names are singular.

🚀 Hint¶

The code below creates a new Pandas DataFrame from two series.

my_new_dataframe = pd.DataFrame({
    "column_one": my_series1,
    "column_two": my_series2
})

🔑 Expected Output¶

	brand	ranking
0	Apple	1
1	Amazon	2
2	Google	3

brands = ["Apple", "Amazon", "Google"]
rankings = [1, 2, 3]

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_brands)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'create-a-pandas-dataframe'
_points = 2

pd.testing.assert_frame_equal(
    df_brands.reset_index(drop=True),
    pd.DataFrame(
        {"brand": {0: "Apple", 1: "Amazon", 2: "Google"},
 "ranking": {0: 1, 1: 2, 2: 3}})
)

Exercises using the Maven Toys Dataset¶

For the remainder of this exercise, you’ll be working with toy products data.

Data Source: Maven Analytics Datasets

📌 Load data¶

▶️ Run the code cell below to create a new DataFrame named df_products.

df_products = pd.read_csv("https://raw.githubusercontent.com/bdi475/datasets/main/maven-toys-data/products.csv")

# Used to keep a clean copy
df_products_copy = df_products.copy()

# Display the first 5 rows
df_products.head()

The table below describes the columns in df_products.

Field	Description
Product_ID	Product ID
Product_Name	Product name
Product_Category	Product Category
Product_Cost	Product cost (USD)
Product_Price	Product retail price (USD)

🎯 Challenge 4: Find the number of rows and columns¶

👇 Tasks¶

✔️ Store the number of rows in df_products to a new variable named num_rows.
✔️ Store the number of columns in df_products to a new variable named num_cols.
✔️ Use .shape, not len().

# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'find-num-rows-and-cols'
_points = 2

tc.assertEqual(num_rows, len(df_products_copy.index), f"Number of rows should be {len(df_products_copy.index)}")
tc.assertEqual(num_cols, len(df_products_copy.columns), f"Number of columns should be {len(df_products_copy.columns)}")

🎯 Challenge 5: Find all games¶

👇 Tasks¶

✔️ Using df_products, find all products in the "Games" category (df_products["Product_Category"] == "Games").
✔️ Store the result to a new variable named df_games.
✔️ df_products should remain unaltered.

🔑 Expected Output of `df_games`¶

	Product_ID	Product_Name	Product_Category	Product_Cost	Product_Price
3	4	Chutes & Ladders	Games	9.99	12.99
4	5	Classic Dominoes	Games	7.99	9.99
7	8	Deck Of Cards	Games	3.99	6.99
13	14	Glass Marbles	Games	5.99	10.99
15	16	Jenga	Games	2.99	9.99
21	22	Monopoly	Games	13.99	19.99
29	30	Rubik’s Cube	Games	17.99	19.99
34	35	Uno Card Game	Games	3.99	7.99

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_games)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'find-all-games'
_points = 3

import base64
q = b'UHJvZHVjdF9DYXRlZ29yeSA9PSAnR2FtZXMn'

pd.testing.assert_frame_equal(
    df_games.sort_values(df_games.columns.to_list()).reset_index(drop=True),
    df_products_copy.query(base64.b64decode(q).decode('ascii')).sort_values(df_products_copy.columns.to_list()).reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

🎯 Challenge 6: Find electronics with a product cost over $10¶

👇 Tasks¶

✔️ Using df_products, find all products that matches the following two conditions:
1. in the "Electronics" category (df_products["Product_Category"] == "Electronics")
2. and the product cost is over 10 dollars (df_products["Product_Cost"] > 10).
✔️ Store the result to a new variable named df_electronics_over_10.
✔️ df_products should remain unaltered.

🔑 Expected Output of `df_electronics_over_10`¶

	Product_ID	Product_Name	Product_Category	Product_Cost	Product_Price
12	13	Gamer Headphones	Electronics	14.99	20.99
33	34	Toy Robot	Electronics	20.99	25.99

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_electronics_over_10)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'find-electronics-over-10-dollars'
_points = 4

import base64

q = b'KFByb2R1Y3RfQ2F0ZWdvcnkgPT0gJ0VsZWN0cm9uaWNzJykgJiAoUHJvZHVjdF9Db3N0ID4gMTAp'

pd.testing.assert_frame_equal(
    df_electronics_over_10.sort_values(df_electronics_over_10.columns.to_list()).reset_index(drop=True),
    df_products_copy.query(base64.b64decode(q).decode('ascii')).sort_values(df_products_copy.columns.to_list()).reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

🎯 Challenge 7: Sort by Price in Descending Order¶

👇 Tasks¶

✔️ Sort df_products by price (Product_Price column) in descending order.
✔️ Store the sorted result to a new variable named df_sorted_by_price.
✔️ df_products should remain unaltered.

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_sorted_by_price)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'sort-by-price-desc'
_points = 2

pd.testing.assert_series_equal(
    df_sorted_by_price["Product_Price"].reset_index(drop=True),
    df_products_copy.sort_values("Product_Price").iloc[::-1]["Product_Price"].reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

🎯 Challenge 8: Sort by Product Category and Product Name¶

👇 Tasks¶

✔️ Sort df_products by product category in ascending order and then by product price in descending order for products within each category.
✔️ Store the sorted result to a new variable named df_sorted_by_category_price.
✔️ If two rows have the same product category and the same price, the order of those two rows doesn’t matter.
✔️ df_products should remain unaltered.

🔑 Sample Output of `df_sorted_by_category_price`¶

	Product_ID	Product_Name	Product_Category	Product_Cost	Product_Price
25	26	PlayDoh Playset	Art & Crafts	20.99	24.99
10	11	Etch A Sketch	Art & Crafts	10.99	20.99
16	17	Kids Makeup Kit	Art & Crafts	13.99	19.99
18	19	Magic Sand	Art & Crafts	13.99	15.99
27	28	Playfoam	Art & Crafts	3.99	10.99
26	27	PlayDoh Toolkit	Art & Crafts	3.99	4.99
2	3	Barrel O’ Slime	Art & Crafts	1.99	3.99
24	25	PlayDoh Can	Art & Crafts	1.99	2.99
33	34	Toy Robot	Electronics	20.99	25.99
12	13	Gamer Headphones	Electronics	14.99	20.99
5	6	Colorbuds	Electronics	6.99	14.99
21	22	Monopoly	Games	13.99	19.99
29	30	Rubik’s Cube	Games	17.99	19.99
3	4	Chutes & Ladders	Games	9.99	12.99
13	14	Glass Marbles	Games	5.99	10.99
4	5	Classic Dominoes	Games	7.99	9.99
15	16	Jenga	Games	2.99	9.99
34	35	Uno Card Game	Games	3.99	7.99
7	8	Deck Of Cards	Games	3.99	6.99
19	20	Mini Basketball Hoop	Sports & Outdoors	8.99	24.99
23	24	Nerf Gun	Sports & Outdoors	14.99	19.99
6	7	Dart Gun	Sports & Outdoors	11.99	15.99
31	32	Supersoaker Water Gun	Sports & Outdoors	11.99	14.99
11	12	Foam Disk Launcher	Sports & Outdoors	8.99	11.99
20	21	Mini Ping Pong Set	Sports & Outdoors	6.99	9.99
30	31	Splash Balls	Sports & Outdoors	7.99	8.99
17	18	Lego Bricks	Toys	34.99	39.99
28	29	Plush Pony	Toys	8.99	19.99
0	1	Action Figure	Toys	9.99	15.99
9	10	Dinosaur Figures	Toys	10.99	14.99
1	2	Animal Figures	Toys	9.99	12.99
32	33	Teddy Bear	Toys	10.99	12.99
8	9	Dino Egg	Toys	9.99	10.99
22	23	Mr. Potatohead	Toys	4.99	9.99
14	15	Hot Wheels 5-Pack	Toys	3.99	5.99

# YOUR CODE BEGINS

# YOUR CODE ENDS

display(df_sorted_by_category_price)

🧭 Check Your Work¶

Run the code cell below to test your solution.

✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = 'sort-by-cat-asc-name-desc'
_points = 3

sample_sorted = df_products_copy.sort_values(["Product_Price", "Product_Category"][::-1], ascending=[False, True]).iloc[::-1]

pd.testing.assert_series_equal(
    df_sorted_by_category_price["Product_Category"].reset_index(drop=True),
    sample_sorted["Product_Category"].reset_index(drop=True)
)
pd.testing.assert_series_equal(
    df_sorted_by_category_price["Product_Price"].reset_index(drop=True),
    sample_sorted["Product_Price"].reset_index(drop=True)
)
pd.testing.assert_frame_equal(
    df_products.reset_index(drop=True),
    df_products_copy.reset_index(drop=True),
    "The original DataFrame should remain unchanged."
)

Assignments

Exercise 4 - Introduction to Pandas

Assignments

Quiz 2 Prep - Python Applications

Exercise 5 - Pandas Filtering and Sorting

🎯 Challenge 1: Import Pandas and NumPy¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 2: Create a Pandas Series¶

👇 Tasks¶

🚀 Hint¶

🧭 Check Your Work¶

🎯 Challenge 3: Create a Pandas DataFrame¶

👇 Tasks¶

🚀 Hint¶

🔑 Expected Output¶

🧭 Check Your Work¶

Exercises using the Maven Toys Dataset¶

📌 Load data¶

🎯 Challenge 4: Find the number of rows and columns¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 5: Find all games¶

👇 Tasks¶

🔑 Expected Output of df_games¶

🧭 Check Your Work¶

🎯 Challenge 6: Find electronics with a product cost over $10¶

👇 Tasks¶

🔑 Expected Output of df_electronics_over_10¶

🧭 Check Your Work¶

🎯 Challenge 7: Sort by Price in Descending Order¶

👇 Tasks¶

🧭 Check Your Work¶

🎯 Challenge 8: Sort by Product Category and Product Name¶

👇 Tasks¶

🔑 Sample Output of df_sorted_by_category_price¶

🧭 Check Your Work¶

🔑 Expected Output of `df_games`¶

🔑 Expected Output of `df_electronics_over_10`¶

🔑 Sample Output of `df_sorted_by_category_price`¶