🏆 20 points available
✏️ Last updated on 9/13/2022
▶️ First, run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.
# DO NOT MODIFY THE CODE IN THIS CELL
import unittest
tc = unittest.TestCase()🎯 Challenge 1: Import Pandas and NumPy¶
👇 Tasks¶
✔️ Import the following Python packages.
pandas: Use aliaspd.numpy: Use aliasnp.
# YOUR CODE BEGINS
# YOUR CODE ENDS🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'import-pandas-numpy'
_points = 2
tc.assertTrue("pd" in globals(), "Check whether you have correctly import Pandas with an alias.")
tc.assertTrue("np" in globals(), "Check whether you have correctly import NumPy with an alias.")🎯 Challenge 2: Create a Pandas Series¶
👇 Tasks¶
✔️ Create a new Pandas
Seriesnamedsample_serieswith the following four values:-20,-10,10,20
🚀 Hint¶
The code below creates a new Pandas Series with the values 1 and 2.
my_new_series = pd.Series([1, 2])# YOUR CODE BEGINS
# YOUR CODE ENDS
print(sample_series)🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'create-a-pandas-series'
_points = 2
pd.testing.assert_series_equal(sample_series, pd.Series(x * 10 for x in [-2, -1, 1, 2]))🎯 Challenge 3: Create a Pandas DataFrame¶
👇 Tasks¶
✔️ You are given two lists -
brandsandrankingsthat contain the names of make-up products and the number of reviews on Sephora.com.✔️ Using the two lists, create a new Pandas
DataFramenameddf_brandsthat has the following two columns:brand: Names of the brandsranking: Ranking of the brands
✔️ Note that the column names are singular.
🚀 Hint¶
The code below creates a new Pandas DataFrame from two series.
my_new_dataframe = pd.DataFrame({
"column_one": my_series1,
"column_two": my_series2
})🔑 Expected Output¶
| brand | ranking | |
|---|---|---|
| 0 | Apple | 1 |
| 1 | Amazon | 2 |
| 2 | 3 |
brands = ["Apple", "Amazon", "Google"]
rankings = [1, 2, 3]
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_brands)🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'create-a-pandas-dataframe'
_points = 2
pd.testing.assert_frame_equal(
df_brands.reset_index(drop=True),
pd.DataFrame(
{"brand": {0: "Apple", 1: "Amazon", 2: "Google"},
"ranking": {0: 1, 1: 2, 2: 3}})
)Exercises using the Maven Toys Dataset¶
For the remainder of this exercise, you’ll be working with toy products data.
Data Source: Maven Analytics Datasets
📌 Load data¶
▶️ Run the code cell below to create a new DataFrame named df_products.
df_products = pd.read_csv("https://raw.githubusercontent.com/bdi475/datasets/main/maven-toys-data/products.csv")
# Used to keep a clean copy
df_products_copy = df_products.copy()
# Display the first 5 rows
df_products.head()The table below describes the columns in df_products.
| Field | Description |
|---|---|
| Product_ID | Product ID |
| Product_Name | Product name |
| Product_Category | Product Category |
| Product_Cost | Product cost (USD) |
| Product_Price | Product retail price (USD) |
🎯 Challenge 4: Find the number of rows and columns¶
👇 Tasks¶
✔️ Store the number of rows in
df_productsto a new variable namednum_rows.✔️ Store the number of columns in
df_productsto a new variable namednum_cols.✔️ Use
.shape, notlen().
# YOUR CODE BEGINS
# YOUR CODE ENDS
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'find-num-rows-and-cols'
_points = 2
tc.assertEqual(num_rows, len(df_products_copy.index), f"Number of rows should be {len(df_products_copy.index)}")
tc.assertEqual(num_cols, len(df_products_copy.columns), f"Number of columns should be {len(df_products_copy.columns)}")🎯 Challenge 5: Find all games¶
👇 Tasks¶
✔️ Using
df_products, find all products in the"Games"category (df_products["Product_Category"] == "Games").✔️ Store the result to a new variable named
df_games.✔️
df_productsshould remain unaltered.
🔑 Expected Output of df_games¶
| Product_ID | Product_Name | Product_Category | Product_Cost | Product_Price | |
|---|---|---|---|---|---|
| 3 | 4 | Chutes & Ladders | Games | 9.99 | 12.99 |
| 4 | 5 | Classic Dominoes | Games | 7.99 | 9.99 |
| 7 | 8 | Deck Of Cards | Games | 3.99 | 6.99 |
| 13 | 14 | Glass Marbles | Games | 5.99 | 10.99 |
| 15 | 16 | Jenga | Games | 2.99 | 9.99 |
| 21 | 22 | Monopoly | Games | 13.99 | 19.99 |
| 29 | 30 | Rubik’s Cube | Games | 17.99 | 19.99 |
| 34 | 35 | Uno Card Game | Games | 3.99 | 7.99 |
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_games)🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'find-all-games'
_points = 3
import base64
q = b'UHJvZHVjdF9DYXRlZ29yeSA9PSAnR2FtZXMn'
pd.testing.assert_frame_equal(
df_games.sort_values(df_games.columns.to_list()).reset_index(drop=True),
df_products_copy.query(base64.b64decode(q).decode('ascii')).sort_values(df_products_copy.columns.to_list()).reset_index(drop=True)
)
pd.testing.assert_frame_equal(
df_products.reset_index(drop=True),
df_products_copy.reset_index(drop=True),
"The original DataFrame should remain unchanged."
)🎯 Challenge 6: Find electronics with a product cost over $10¶
👇 Tasks¶
✔️ Using
df_products, find all products that matches the following two conditions:in the
"Electronics"category (df_products["Product_Category"] == "Electronics")and the product cost is over 10 dollars (
df_products["Product_Cost"] > 10).
✔️ Store the result to a new variable named
df_electronics_over_10.✔️
df_productsshould remain unaltered.
🔑 Expected Output of df_electronics_over_10¶
| Product_ID | Product_Name | Product_Category | Product_Cost | Product_Price | |
|---|---|---|---|---|---|
| 12 | 13 | Gamer Headphones | Electronics | 14.99 | 20.99 |
| 33 | 34 | Toy Robot | Electronics | 20.99 | 25.99 |
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_electronics_over_10)🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'find-electronics-over-10-dollars'
_points = 4
import base64
q = b'KFByb2R1Y3RfQ2F0ZWdvcnkgPT0gJ0VsZWN0cm9uaWNzJykgJiAoUHJvZHVjdF9Db3N0ID4gMTAp'
pd.testing.assert_frame_equal(
df_electronics_over_10.sort_values(df_electronics_over_10.columns.to_list()).reset_index(drop=True),
df_products_copy.query(base64.b64decode(q).decode('ascii')).sort_values(df_products_copy.columns.to_list()).reset_index(drop=True)
)
pd.testing.assert_frame_equal(
df_products.reset_index(drop=True),
df_products_copy.reset_index(drop=True),
"The original DataFrame should remain unchanged."
)🎯 Challenge 7: Sort by Price in Descending Order¶
👇 Tasks¶
✔️ Sort
df_productsby price (Product_Pricecolumn) in descending order.✔️ Store the sorted result to a new variable named
df_sorted_by_price.✔️
df_productsshould remain unaltered.
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_sorted_by_price)🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'sort-by-price-desc'
_points = 2
pd.testing.assert_series_equal(
df_sorted_by_price["Product_Price"].reset_index(drop=True),
df_products_copy.sort_values("Product_Price").iloc[::-1]["Product_Price"].reset_index(drop=True)
)
pd.testing.assert_frame_equal(
df_products.reset_index(drop=True),
df_products_copy.reset_index(drop=True),
"The original DataFrame should remain unchanged."
)🎯 Challenge 8: Sort by Product Category and Product Name¶
👇 Tasks¶
✔️ Sort
df_productsby product category in ascending order and then by product price in descending order for products within each category.✔️ Store the sorted result to a new variable named
df_sorted_by_category_price.✔️ If two rows have the same product category and the same price, the order of those two rows doesn’t matter.
✔️
df_productsshould remain unaltered.
🔑 Sample Output of df_sorted_by_category_price¶
| Product_ID | Product_Name | Product_Category | Product_Cost | Product_Price | |
|---|---|---|---|---|---|
| 25 | 26 | PlayDoh Playset | Art & Crafts | 20.99 | 24.99 |
| 10 | 11 | Etch A Sketch | Art & Crafts | 10.99 | 20.99 |
| 16 | 17 | Kids Makeup Kit | Art & Crafts | 13.99 | 19.99 |
| 18 | 19 | Magic Sand | Art & Crafts | 13.99 | 15.99 |
| 27 | 28 | Playfoam | Art & Crafts | 3.99 | 10.99 |
| 26 | 27 | PlayDoh Toolkit | Art & Crafts | 3.99 | 4.99 |
| 2 | 3 | Barrel O’ Slime | Art & Crafts | 1.99 | 3.99 |
| 24 | 25 | PlayDoh Can | Art & Crafts | 1.99 | 2.99 |
| 33 | 34 | Toy Robot | Electronics | 20.99 | 25.99 |
| 12 | 13 | Gamer Headphones | Electronics | 14.99 | 20.99 |
| 5 | 6 | Colorbuds | Electronics | 6.99 | 14.99 |
| 21 | 22 | Monopoly | Games | 13.99 | 19.99 |
| 29 | 30 | Rubik’s Cube | Games | 17.99 | 19.99 |
| 3 | 4 | Chutes & Ladders | Games | 9.99 | 12.99 |
| 13 | 14 | Glass Marbles | Games | 5.99 | 10.99 |
| 4 | 5 | Classic Dominoes | Games | 7.99 | 9.99 |
| 15 | 16 | Jenga | Games | 2.99 | 9.99 |
| 34 | 35 | Uno Card Game | Games | 3.99 | 7.99 |
| 7 | 8 | Deck Of Cards | Games | 3.99 | 6.99 |
| 19 | 20 | Mini Basketball Hoop | Sports & Outdoors | 8.99 | 24.99 |
| 23 | 24 | Nerf Gun | Sports & Outdoors | 14.99 | 19.99 |
| 6 | 7 | Dart Gun | Sports & Outdoors | 11.99 | 15.99 |
| 31 | 32 | Supersoaker Water Gun | Sports & Outdoors | 11.99 | 14.99 |
| 11 | 12 | Foam Disk Launcher | Sports & Outdoors | 8.99 | 11.99 |
| 20 | 21 | Mini Ping Pong Set | Sports & Outdoors | 6.99 | 9.99 |
| 30 | 31 | Splash Balls | Sports & Outdoors | 7.99 | 8.99 |
| 17 | 18 | Lego Bricks | Toys | 34.99 | 39.99 |
| 28 | 29 | Plush Pony | Toys | 8.99 | 19.99 |
| 0 | 1 | Action Figure | Toys | 9.99 | 15.99 |
| 9 | 10 | Dinosaur Figures | Toys | 10.99 | 14.99 |
| 1 | 2 | Animal Figures | Toys | 9.99 | 12.99 |
| 32 | 33 | Teddy Bear | Toys | 10.99 | 12.99 |
| 8 | 9 | Dino Egg | Toys | 9.99 | 10.99 |
| 22 | 23 | Mr. Potatohead | Toys | 4.99 | 9.99 |
| 14 | 15 | Hot Wheels 5-Pack | Toys | 3.99 | 5.99 |
# YOUR CODE BEGINS
# YOUR CODE ENDS
display(df_sorted_by_category_price)🧭 Check Your Work¶
Run the code cell below to test your solution.
✔️ If the code cell runs without errors, you’re good to move on.
❌ If the code cell produces an error, review your code and fix any mistakes.
_test_case = 'sort-by-cat-asc-name-desc'
_points = 3
sample_sorted = df_products_copy.sort_values(["Product_Price", "Product_Category"][::-1], ascending=[False, True]).iloc[::-1]
pd.testing.assert_series_equal(
df_sorted_by_category_price["Product_Category"].reset_index(drop=True),
sample_sorted["Product_Category"].reset_index(drop=True)
)
pd.testing.assert_series_equal(
df_sorted_by_category_price["Product_Price"].reset_index(drop=True),
sample_sorted["Product_Price"].reset_index(drop=True)
)
pd.testing.assert_frame_equal(
df_products.reset_index(drop=True),
df_products_copy.reset_index(drop=True),
"The original DataFrame should remain unchanged."
)