Skip to article frontmatterSkip to article content
  • ๐Ÿ† 20 points available


โ–ถ๏ธ Run the code cell below to import unittest, a module used for ๐Ÿงญ Check Your Work sections and the autograder.

import unittest
import base64
import plotly

tc = unittest.TestCase()

๐ŸŽฏ Exercise 1: Import Packagesยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Import the following Python packages.

    1. pandas: Use alias pd.

    2. numpy: Use alias np.

    3. plotly.express: Use alias px.

    4. plotly.graph_objects: Use alias go.

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = "part-01"
_points = 2

tc.assertTrue(
    "pd" in globals(), "Check whether you have correctly imported Pandas with an alias."
)
tc.assertTrue(
    "np" in globals(), "Check whether you have correctly imported NumPy with an alias."
)

print(f"The current plotly version is {plotly.__version__}")
plotly_major_version = int(plotly.__version__.split(".")[0])
tc.assertGreaterEqual(
    plotly_major_version, 5, "Your plotly version should be greater than or equal to 5"
)

tc.assertIsNotNone(
    go.Figure,
    "Check whether you have correctly imported plotly.graph_objects with an alias go.",
)
tc.assertIsNotNone(
    px.scatter,
    "Check whether you have correctly imported plotly.express with an alias px.",
)

๐ŸŽฏ Exercise 2: Annual closing gold price ๐Ÿ“ˆยถ

โ–ถ๏ธ Run the code cell below to import annual gold closing prices dataset ๐Ÿ’›.

# DO NOT CHANGE THE CODE BELOW
df_gold = pd.read_csv(
    "https://github.com/bdi475/datasets/raw/main/gold-annual-closing-price.csv"
)
df_gold_backup = df_gold.copy()
df_gold.head(5)

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_gold, create a line chart that displays the closing price by year.

  • โœ”๏ธ Store your figure to a variable named fig.

  • โœ”๏ธ Add an appropriate title to your figure.

  • โœ”๏ธ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿ”‘ Sample outputยถ

image

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-02"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "scatter", "Must be a line chart")
tc.assertIsNotNone(fig.data[0].line.color, "Must be a line chart")
np.testing.assert_array_equal(fig.data[0].x, df_gold["Year"], "Incorrect x-axis data")
np.testing.assert_array_equal(
    fig.data[0].y, df_gold["Closing Price"], "Incorrect y-axis data"
)

๐ŸŽฏ Exercise 3: Annual closing gold price in 2000s ๐Ÿ“ˆยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_gold, create a line chart that displays the closing price by year.

    • Only include years 2000 or later (df_gold['Year'] >= 2000).

  • โœ”๏ธ Store your figure to a variable named fig.

  • โœ”๏ธ Add an appropriate title to your figure.

  • โœ”๏ธ Display the figure using fig.show()

๐Ÿš€ Hintsยถ

  • ๐Ÿ‘‰ fig = px.line(df_gold[df_gold['Year'] >= 2000], x='Year', ...)

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿ”‘ Sample outputยถ

image

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-03"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "scatter", "Must be a line chart")
tc.assertIsNotNone(fig.data[0].line.color, "Must be a line chart")
np.testing.assert_array_equal(
    fig.data[0].x,
    df_gold_backup.query(base64.b64decode(b"WWVhciA+PSAyMDAw").decode())["Year"],
    "Incorrect x-axis data",
)
np.testing.assert_array_equal(
    fig.data[0].y,
    df_gold_backup.query(base64.b64decode(b"WWVhciA+PSAyMDAw").decode())[
        "Closing Price"
    ],
    "Incorrect y-axis data",
)

๐Ÿ“Œ Import datasetยถ

BLUEbikes

From this point on, you will work with bikesharing trips dataset ๐Ÿšฒ. The original dataset has been retrieved from https://www.bluebikes.com/system-data.

โ–ถ๏ธ Run the code below to import the dataset. This dataset is a fairly large with 200k rows, so it may take up to a few minutes.

# Display all columns
pd.set_option("display.max_columns", 50)

df_trips = pd.read_csv(
    "https://github.com/bdi475/datasets/blob/main/bluebikes-trip-data-2020-sampled.csv.gz?raw=true",
    compression="gzip",
    parse_dates=["start_time", "stop_time"],
)

df_trips_backup = df_trips.copy()

display(df_trips)

๐ŸŽฏ Exercise 4: Create an aggregated DataFrame with number of trips by dateยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ One of the common tasks when visualizing your data is to aggregate your data before plotting them.

  • โœ”๏ธ Using df_trips, create a new DataFrame named df_num_trips_by_date that holds the number of trips by date.

  • โœ”๏ธ We will give you the fully working code below.

Code
# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-04"
_points = 2

df_check = (
    df_trips_backup.groupby(df_trips_backup["start_time"].dt.date)
    .size()
    .reset_index()
    .rename(columns={"start_time": "date", 0: "num_trips"})
)

pd.testing.assert_frame_equal(
    df_num_trips_by_date.sort_values("date").reset_index(drop=True),
    df_check.sort_values("date").reset_index(drop=True),
)

๐ŸŽฏ Exercise 5: Number of trips by date (๐Ÿ“ˆ Line Chart)ยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_num_trips_by_date, create a line chart that displays the number of trips by date.

  • โœ”๏ธ Store your figure to a variable named fig.

  • โœ”๏ธ Add an appropriate title to your figure.

  • โœ”๏ธ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿ”‘ Sample outputยถ

image

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-05"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "scatter", "Must be a line chart")
tc.assertIsNotNone(fig.data[0].line.color, "Must be a line chart")
np.testing.assert_array_equal(
    fig.data[0].x, df_num_trips_by_date["date"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(
    fig.data[0].y, df_num_trips_by_date["num_trips"], "Incorrect y-axis data"
)

๐ŸŽฏ Exercise 6: Number of trips by date (Scatter Plot)ยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_num_trips_by_date, create a scatter plot that displays the number of trips by date.

  • โœ”๏ธ Store your figure to a variable named fig.

  • โœ”๏ธ Add an appropriate title to your figure.

  • โœ”๏ธ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿ”‘ Sample outputยถ

image

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-06"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "scatter", "Must be a scatter plot")
tc.assertIsNone(fig.data[0].line.color, "Must be a scatter plot")
np.testing.assert_array_equal(
    fig.data[0].x, df_num_trips_by_date["date"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(
    fig.data[0].y, df_num_trips_by_date["num_trips"], "Incorrect y-axis data"
)

๐ŸŽฏ Exercise 7: Number of trips by week of the year at the top 3 stationsยถ

โ–ถ๏ธ Run the code below to find the top 3 stations (by start position).

top3_start_stations = df_trips["start_station_name"].value_counts().index[:3]
top3_start_stations

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_trips and top3_start_stations, create a new DataFrame named df_num_trips_from_station that holds the number of trips by week of the year at the top 3 stations.

  • โœ”๏ธ We will give you the fully working code below.

Code
# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-07"
_points = 2

df_t3 = df_trips_backup[
    df_trips_backup["start_station_name"].isin(
        df_trips_backup["start_station_name"].value_counts().index[:3]
    )
]

df_check = (
    df_t3.groupby([df_t3["start_time"].dt.isocalendar().week, "start_station_name"])
    .size()
    .reset_index()
    .rename(columns={"start_time": "date", 0: "num_trips"})
)

pd.testing.assert_frame_equal(
    df_num_trips_from_station.sort_values(["week", "start_station_name"]).reset_index(
        drop=True
    ),
    df_check.sort_values(["week", "start_station_name"]).reset_index(drop=True),
)

๐ŸŽฏ Exercise 8: Number of trips by week of the year at the top 3 stations (Line Chart)ยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_num_trips_from_station, create a line chart that displays the number of trips by week.

  • โœ”๏ธ Draw three line charts on a single figure.

    • Use different colors to distinguish start_station_name.

  • โœ”๏ธ Store your figure to a variable named fig.

  • โœ”๏ธ Add an appropriate title to your figure.

  • โœ”๏ธ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿ”‘ Sample outputยถ

Expected Output

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-08"
_points = 2

tc.assertEqual(len(fig.data), 3, "There must be three plots in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")

for i in range(3):
    tc.assertEqual(fig.data[i].type, "scatter", "Must be a line plot")
    tc.assertIsNotNone(fig.data[i].line.color, "Must be a line plot")

    np.testing.assert_array_equal(
        fig.data[i].x,
        df_num_trips_from_station[
            df_num_trips_from_station["start_station_name"]
            == df_num_trips_from_station["start_station_name"].unique()[i]
        ]["week"],
        "Incorrect x-axis data",
    )
    np.testing.assert_array_equal(
        fig.data[i].y,
        df_num_trips_from_station[
            df_num_trips_from_station["start_station_name"]
            == df_num_trips_from_station["start_station_name"].unique()[i]
        ]["num_trips"],
        "Incorrect y-axis data",
    )

๐ŸŽฏ Exercise 9: Create an aggregated DataFrame with number of trips by stationยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_trips, create a new DataFrame named df_num_trips_by_month that holds the number of trips by start station names.

  • โœ”๏ธ Only select the top 10 stations (by number of trips).

  • โœ”๏ธ We will give you the fully working code below.

Code
# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-09"
_points = 2

df_check = (
    df_trips_backup.groupby("start_station_name")
    .size()
    .sort_values(ascending=False)
    .iloc[:10]
    .reset_index()
    .rename(columns={"start_station_name": "start_station", 0: "num_trips"})
)

pd.testing.assert_frame_equal(
    df_num_trips_from.reset_index(drop=True), df_check.reset_index(drop=True)
)

๐ŸŽฏ Exercise 10: Number of trips by station (Bar Chart)ยถ

๐Ÿ‘‡ Tasksยถ

  • โœ”๏ธ Using df_num_trips_from, create a bar chart that displays the number of trips by station.

  • โœ”๏ธ Store your figure to a variable named fig.

  • โœ”๏ธ Add an appropriate title to your figure.

  • โœ”๏ธ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

๐Ÿ”‘ Sample outputยถ

Expected Output

๐Ÿงญ Check Your Workยถ

Run the code cell below to test your solution.

  • โœ”๏ธ If the code cell runs without errors, youโ€™re good to move on.

  • โŒ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-10"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "bar", "Must be a bar chart")
tc.assertEqual(
    fig.data[0].orientation, "v", "Your plot should have a vertical orientation"
)
np.testing.assert_array_equal(
    fig.data[0].x, df_num_trips_from["start_station"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(
    fig.data[0].y, df_num_trips_from["num_trips"], "Incorrect y-axis data"
)