Skip to article frontmatterSkip to article content

Exercise 8 - Introduction to Data Visualization with Python

  • πŸ† 20 points available


▢️ Run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

import unittest
import base64
import plotly

tc = unittest.TestCase()

🎯 Exercise 1: Import Packages¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Import the following Python packages.

    1. pandas: Use alias pd.

    2. numpy: Use alias np.

    3. plotly.express: Use alias px.

    4. plotly.graph_objects: Use alias go.

# YOUR CODE BEGINS

# YOUR CODE ENDS

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

# DO NOT CHANGE THE CODE IN THIS CELL
_test_case = "part-01"
_points = 2

tc.assertTrue(
    "pd" in globals(), "Check whether you have correctly imported Pandas with an alias."
)
tc.assertTrue(
    "np" in globals(), "Check whether you have correctly imported NumPy with an alias."
)

print(f"The current plotly version is {plotly.__version__}")
plotly_major_version = int(plotly.__version__.split(".")[0])
tc.assertGreaterEqual(
    plotly_major_version, 5, "Your plotly version should be greater than or equal to 5"
)

tc.assertIsNotNone(
    go.Figure,
    "Check whether you have correctly imported plotly.graph_objects with an alias go.",
)
tc.assertIsNotNone(
    px.scatter,
    "Check whether you have correctly imported plotly.express with an alias px.",
)

πŸ“Œ Import datasetΒΆ

Today, we work with a list of used cars information. The data has been downloaded from https://www.kaggle.com/datasets/akshaydattatraykhare/car-details-dataset without any modification.

▢️ Run the code below to import an cars Dataset. πŸš—πŸš“πŸš•πŸ›ΊπŸš™

# Display all columns
pd.set_option("display.max_columns", 50)

df_cars = pd.read_csv(
    "https://github.com/bdi475/datasets/raw/main/car-dekho-used-cars.csv"
)

display(df_cars)

🎯 Exercise 2: Number of rows and columns¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Retrieve the number of rows in df_cars to a new variable named num_rows.

  • βœ”οΈ Retrieve the number of columns in df_cars to a new variable named num_cols.

  • βœ”οΈ Both num_rows and num_cols should be integer types.

πŸš€ HintsΒΆ

  • my_dataframe.shape returns a tuple containing the number of rows and columns of my_dataframe.

  • You can retrieve the first element of a tuple using square brackets notation.

    • Example: my_dataframe.shape[0]

# YOUR CODE BEGINS

# YOUR CODE ENDS

print(f"df_cars contains {num_rows} rows and {num_cols} columns.")

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-02"
_points = 2

tc.assertEqual(
    num_rows, len(df_cars.index), f"Number of rows should be {len(df_cars.index)}"
)
tc.assertEqual(
    num_cols,
    len(df_cars.columns),
    f"Number of columns should be {len(df_cars.columns)}",
)

🎯 Exercise 3: Selling price box plot (horizontal)¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw a horizontal box plot of selling_price.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

    • A title should describe your plot (e.g., Salary Box Plot).

  • βœ”οΈ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Salary distribution horizontal

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-03"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "box", "Not a box plot")
tc.assertEqual(
    fig.data[0].orientation, "h", "Your plot should have a horizontal orientation"
)
np.testing.assert_array_equal(fig.data[0].x, df_cars["selling_price"], "Incorrect data")

🎯 Exercise 4: Selling price distribution by number of previous owners¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw horizontal box plots of selling_price by owner.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

  • βœ”οΈ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Salary distribution by citizenship status box plots

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-04"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "box", "Not a box plot")
tc.assertEqual(
    fig.data[0].orientation, "h", "Your plot should have a horizontal orientation"
)
np.testing.assert_array_equal(
    fig.data[0].x, df_cars["selling_price"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(fig.data[0].y, df_cars["owner"], "Incorrect y-axis data")

🎯 Exercise 5: Driven distance distribution by transmission¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw horizontal box plots of km_driven by transmission.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

  • βœ”οΈ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Salary distribution by performance score box plots

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-05"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "box", "Not a box plot")
tc.assertEqual(
    fig.data[0].orientation, "h", "Your plot should have a horizontal orientation"
)
np.testing.assert_array_equal(
    fig.data[0].x, df_cars["km_driven"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(
    fig.data[0].y, df_cars["transmission"], "Incorrect y-axis data"
)

🎯 Exercise 6: Selling price distribution by fuel type¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw horizontal box plots of selling_price by fuel.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

  • βœ”οΈ Set the height of your figure to 600.

  • βœ”οΈ Display the figure using fig.show()

πŸš€ HintsΒΆ

fig = px.box(
    my_dataframe,
    x='my_column1',
    y='my_column2',
    title='Plot Title Goes Here',
    height=600
)
fig.show()
# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Salary distribution by department box plots

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-06"
_points = 3

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "box", "Not a box plot")
tc.assertEqual(
    fig.data[0].orientation, "h", "Your plot should have a horizontal orientation"
)
tc.assertEqual(fig.layout.height, 600, "Incorrect height")
np.testing.assert_array_equal(
    fig.data[0].x, df_cars["selling_price"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(fig.data[0].y, df_cars["fuel"], "Incorrect y-axis data")

🎯 Exercise 7: Selling price distribution by seller type¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw horizontal box plots of selling_price by seller_type.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

  • βœ”οΈ Set the height of your figure to 700.

  • βœ”οΈ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Salary distribution by recruitment source box plots

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-07"
_points = 3

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "box", "Not a box plot")
tc.assertEqual(
    fig.data[0].orientation, "h", "Your plot should have a horizontal orientation"
)
tc.assertEqual(fig.layout.height, 700, "Incorrect height")
np.testing.assert_array_equal(
    fig.data[0].x, df_cars["selling_price"], "Incorrect x-axis data"
)
np.testing.assert_array_equal(
    fig.data[0].y, df_cars["seller_type"], "Incorrect y-axis data"
)

🎯 Exercise 8: Selling price histogram¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw a histogram (vertical) of selling_price in df_cars.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

  • βœ”οΈ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Salary dispersion histogram

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-08"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "histogram", "Not a histogram")
tc.assertEqual(
    fig.data[0].orientation, "v", "Your plot should have a vertical orientation"
)
np.testing.assert_array_equal(fig.data[0].x, df_cars["selling_price"], "Incorrect data")

🎯 Exercise 9: Distance driven distribution¢

πŸ‘‡ TasksΒΆ

  • βœ”οΈ Draw a histogram (vertical) of km_driven in df_cars.

  • βœ”οΈ Store your figure to a variable named fig.

  • βœ”οΈ Add an appropriate title to your figure.

  • βœ”οΈ Display the figure using fig.show()

# YOUR CODE BEGINS

# YOUR CODE ENDS

πŸ”‘ Sample outputΒΆ

Number of absence histogram

🧭 Check Your Work¢

Run the code cell below to test your solution.

  • βœ”οΈ If the code cell runs without errors, you’re good to move on.

  • ❌ If the code cell produces an error, review your code and fix any mistakes.

_test_case = "part-09"
_points = 2

tc.assertEqual(len(fig.data), 1, "There must be only one plot in your figure")
tc.assertIsNotNone(fig.layout.title.text, "Missing figure title")
tc.assertEqual(fig.data[0].type, "histogram", "Not a histogram")
tc.assertEqual(
    fig.data[0].orientation, "v", "Your plot should have a vertical orientation"
)
np.testing.assert_array_equal(fig.data[0].x, df_cars["km_driven"], "Incorrect data")