Skip to article frontmatterSkip to article content

Pandas Indexing and Updating Cells

▶️ Import pandas and numpy.

import pandas as pd
import numpy as np

▶️ Create a new DataFrame named df_companies.

df_companies = pd.DataFrame(
    {
        "company_name": ["Amazon", "Nvidia", "Google", "Microsoft", "Adobe"],
        "ticker": ["AMZN", "AMD", np.nan, "MSFT", np.nan],
        "headquarters": [
            "Champaign",
            "Santa Clara",
            "Mountain View",
            "Redmond",
            "San Jose",
        ],
    }
)

df_companies
Loading...

📇 What is Indexing?

In Pandas, indexing refers to the way we access, select, and reference rows and columns in a Series or a DataFrame. Indexing is a fundamental operation in Pandas that allows us to manipulate and analyze data efficiently. Think of it as the “addressing system” for your data.

We have already seen some indexing methods, such as:

df["dept"]                              # Access a column by its name
df.loc[df["dept"] == "HR", :]           # Access rows based on a condition (all columns)
df.loc[df["dept"] == "HR", "salary"]    # Access rows based on a condition (specific column)
df.iloc[0]                              # Access the first element by its integer position
df.iloc[0, 1]                           # Access the element in the first row and second column

🔖 .loc

The .loc indexer (“location by label”) is used to access rows and columns in a DataFrame by their labels (i.e., the index and column names). It allows for more intuitive and flexible indexing based on the actual labels of the data.

df.loc[row_selection, column_selection]  # Access rows and columns by labels
df.loc[:, column_selection]              # Access all rows for specific columns
df.loc[row_selection, :]                 # Access specific rows for all columns

The row_selection and column_selection can be:

  • A single label (e.g., "HR" or "salary").
  • A list of labels (e.g., ["HR", "IT"] or ["salary", "dept"]).
  • A boolean array-like data (e.g., df["dept"] == "HR").

We can use .loc to update values in a DataFrame. For example, to update the headquarters of Google:

df.loc["row_label", "column_name"] = "new_value"

The .loc indexer can be used with a Series as well:

my_series = pd.Series([10, 20, 30], index=["a", "b", "c"])
my_series.loc["a"]       # Access the first element (10)
my_series.loc["b":"c"]   # Access elements from index "b" to "c" (20, 30)

🎯 Example: Update Nvidia’s ticker using .loc

▶️ Show the DataFrame before the update:

df_companies[df_companies["company_name"] == "Nvidia"]
Loading...

To update one or more cells in a DataFrame, we can use the .loc indexer to specify the rows and columns we want to update, and then assign new values to those cells.

In the code below, we update Nvidia’s ticker symbol from “AMD” to “NVDA” by specifying the row where the company_name is “Nvidia” and the ticker column.

▶️ Update Nvidia’s ticker symbol from “AMD” to “NVDA”:

df_companies.loc[df_companies["company_name"] == "Nvidia", "ticker"] = "NVDA"

▶️ Show the DataFrame after the update:

display(df_companies[df_companies["company_name"] == "Nvidia"])
Loading...

🎯 Example: Fill in missing tickers using .loc

▶️ Show the DataFrame before the update:

display(df_companies.loc[df_companies["company_name"].isin(["Google", "Adobe"])])
Loading...

▶️ Fill in Google and Adobe’s missing ticker symbols.

The syntax is identical to the previous example, except we are updating two cells this time.

df_companies.loc[df_companies["company_name"] == "Google", "ticker"] = "GOOG"
df_companies.loc[df_companies["company_name"] == "Adobe", "ticker"] = "ADBE"

▶️ Show the DataFrame after the update:

display(df_companies.loc[df_companies["company_name"].isin(["Google", "Adobe"])])
Loading...

🔢 .iloc

The .iloc indexer (“integer location”) is used to access rows and columns in a DataFrame by their integer positions (i.e., the row and column indices). It is useful when you want to access data based on its position rather than its label.

df.iloc[row_index, column_index]  # Access rows and columns by integer positions

The row_index and column_index can be:

  • A single integer (e.g., 0 or 1).
  • A list of integers (e.g., [0, 2] or [1, 3]).
  • A slice object (e.g., 0:3 to select the first three rows).

The .iloc indexer can be used to select a specific element from a Series as well.

my_series = pd.Series([10, 20, 30], index=["a", "b", "c"])
my_series.iloc[0]       # Access the first element (10)
my_series.iloc[1:3]     # Access elements from index 1 to 3 (20, 30)

We can also use .iloc to update values in a DataFrame.

df.iloc[row_index, column_index] = new_value

🎯 Example: Update Amazon’s headquarters using .iloc

▶️ Show the DataFrame before the update:

df_companies
Loading...

▶️ Update Amazon’s headquarters from “Champaign” to “Seattle” using .iloc.

df_companies.iloc[0, 2] = "Seattle"

▶️ Show the DataFrame after the update:

df_companies
Loading...

🎯 Example: Retrieve Microsoft’s ticker symbol using .loc

▶️ Retrieve Microsoft’s ticker symbol using .loc.

df_companies.loc[df_companies["company_name"] == "Microsoft", "ticker"]
3 MSFT Name: ticker, dtype: object
type(df_companies.loc[df_companies["company_name"] == "Microsoft", "ticker"])
pandas.core.series.Series

▶️ We can retrieve the first element of the resulting Series using the .iloc indexer. Because we know there is only one matching row, we can use .iloc[0] to get the first (and only) element.

df_companies.loc[df_companies["company_name"] == "Microsoft", "ticker"].iloc[0]
'MSFT'

▶️ Alternatively, we can use the .values attribute to get the underlying NumPy array and then access the first element of that array.

This approach is less common but works just as well. Getting the underlying NumPy array may be useful in some scenarios, but for simple retrieval of a single value, using .iloc[0] is more straightforward.

df_companies.loc[df_companies["company_name"] == "Microsoft", "ticker"].values[0]
'MSFT'

🎯 Example: Retrieve Adobe’s headquarters location using .iloc

▶️ We can retrieve Adobe’s headquarters location using .iloc. Since Adobe is the last company in the DataFrame, we can use negative indexing to access the last row. Similarly, headquarters is the last column, so we can use negative indexing to access the last column as well.

df_companies.iloc[-1, -1]
'San Jose'

🧩 SettingWithCopyWarning

When updating values in a DataFrame, you might encounter a SettingWithCopyWarning. This warning indicates that you are trying to set a value on a copy of a slice from a DataFrame, which may not have the intended effect.

This is one of the most common stumbling blocks when using Pandas. It often occurs when you create a subset of a DataFrame and then try to modify that subset.

It’s Pandas’ way of saying:

“Hey, I’m not sure if you’re changing the original DataFrame or just a temporary copy (view) of it. Your change might not stick.”

When you filter a DataFrame, Pandas may return either:

  • a view: a “window” into the original data
  • a copy: a completely separate object in memory

Pandas doesn’t always know which one it is, so it throws a warning to make sure you’re aware of the potential issue.

We will demonstrate this with an example.

▶️ Create a new DataFrame named df_seats that contains information about available seats in various courses.

df_seats = pd.DataFrame(
    {
        "CourseID": ["ACCY 201", "FIN 300", "ACCY 202", "BADM 101"],
        "Department": ["ACCY", "FIN", "ACCY", "BADM"],
        "Available Seats": [300, 90, 250, 500],
    }
)

df_seats
Loading...

▶️ Filter df_seats to create a new DataFrame named df_accy that only contains accounting courses.

df_accy = df_seats[df_seats["Department"] == "ACCY"]
df_accy
Loading...

▶️ Increase the available seats for all accounting courses by 10%.

df_accy["Available Seats"] = df_accy["Available Seats"] * 1.1
/var/folders/gk/vcjrk04j11z7sqz_nh0vxf9r0000gn/T/ipykernel_77510/3479234074.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_accy["Available Seats"] = df_accy["Available Seats"] * 1.1

You can see the SettingWithCopyWarning because df_accy is a slice of the original df_seats DataFrame. When we try to modify df_accy, Pandas is unsure if we intend to modify the original df_seats DataFrame or just the temporary slice (df_accy).

▶️ Confirm that the available seats in df_accy have been increased by 10%.

df_accy
Loading...

▶️ Check the original df_seats DataFrame to see if the changes are reflected there as well.

df_seats
Loading...

You’ll notice that the changes are not reflected in the original df_seats. This is because df_accy is a separate copy of the data, and modifying it does not affect df_seats.