How one can Use Conditional Formatting in Pandas to Improve Information Visualization

How one can Use Conditional Formatting in Pandas to Improve Information Visualization
Picture by Creator | DALLE-3 & Canva

 

Whereas pandas is especially used for knowledge manipulation and evaluation, it might probably additionally present fundamental knowledge visualization capabilities. Nevertheless, plain dataframes could make the knowledge look cluttered and overwhelming. So, what may be finished to make it higher? In case you’ve labored with Excel earlier than, you already know you could spotlight essential values with completely different colours, font kinds, and so on. The concept of utilizing these kinds and colours is to speak the knowledge in an efficient approach. You are able to do comparable work with pandas dataframes too, utilizing conditional formatting and the Styler object.

On this article, we are going to see what conditional formatting is and learn how to use it to reinforce your knowledge readability.

 

Conditional Formatting

 

Conditional formatting is a characteristic in pandas that permits you to format the cells based mostly on some standards. You may simply spotlight the outliers, visualize developments, or emphasize essential knowledge factors utilizing it. The Styler object in pandas offers a handy technique to apply conditional formatting. Earlier than overlaying the examples, let’s take a fast have a look at how the Styler object works.

 

What’s the Styler Object & How Does It Work?

 

You may management the visible illustration of the dataframe through the use of the property. This property returns a Styler object, which is accountable for styling the dataframe. The Styler object permits you to manipulate the CSS properties of the dataframe to create a visually interesting and informative show. The generic syntax is as follows:

df.type.<technique>(<arguments>)

 

The place <technique> is the particular formatting perform you need to apply, and <arguments> are the parameters required by that perform. The Styler object returns the formatted dataframe with out altering the unique one. There are two approaches to utilizing conditional formatting with the Styler object:

  • Constructed-in Types: To use fast formatting kinds to your dataframe
  • Customized Stylization: Create your personal formatting guidelines for the Styler object and move them via one of many following strategies (Styler.applymap: element-wise or Styler.apply: column-/row-/table-wise)

Now, we are going to cowl some examples of each approaches that will help you improve the visualization of your knowledge.

 

Examples: Constructed-in-Types

 

Let’s create a dummy inventory worth dataset with columns for Date, Price Worth, Satisfaction Rating, and Gross sales Quantity to display the examples under:

import pandas as pd
import numpy as np

knowledge = {'Date': ['2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08', '2024-03-09', '2024-03-10'],
        'Price Worth': [100, 120, 110, 1500, 1600, 1550],
        'Satisfaction Rating': [90, 80, 70, 95, 85, 75],
        'Gross sales Quantity': [1000, 800, 1200, 900, 1100, None]}

df = pd.DataFrame(knowledge)
df

 

Output:

 

Unformatted DataframeUnformatted Dataframe
Unique Unformatted Dataframe

 

1. Highlighting Most and Minimal Values

We are able to use highlight_max and highlight_min features to spotlight the utmost and minimal values in a column or row. For column set axis=0 like this:

# Highlighting Most and Minimal Values
df.type.highlight_max(colour="inexperienced", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount']).highlight_min(colour="crimson", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount'])

 

Output:
 

Max & Min ValuesMax & Min Values
Max & Min Values

 

2. Making use of Shade Gradients

Shade gradients are an efficient technique to visualize the values in your knowledge. On this case, we are going to apply the gradient to satisfaction scores utilizing the colormap set to 'viridis'. This can be a kind of colour coding that ranges from purple (low values) to yellow (excessive values). Right here is how you are able to do this:

# Making use of Shade Gradients
df.type.background_gradient(cmap='viridis', subset=['Satisfaction Score'])

 

Output:

 

Colormap - viridisColormap - viridis
Colormap - viridis

 

3. Highlighting Null or Lacking Values

When we now have massive datasets, it turns into troublesome to establish null or lacking values. You should use conditional formatting utilizing the built-in df.type.highlight_null perform for this goal. For instance, on this case, the gross sales quantity of the sixth entry is lacking. You may spotlight this data like this:

# Highlighting Null or Lacking Values
df.type.highlight_null('yellow', subset=['Sales Amount'])

 

Output:
 

Highlighting Missing ValuesHighlighting Missing Values
Highlighting Lacking Values

 

Examples: Customized Stylization Utilizing apply() & applymap()

 

1.  Conditional Formatting for Outliers

Suppose that we now have a housing dataset with their costs, and we need to spotlight the homes with outlier costs (i.e., costs which might be considerably greater or decrease than the opposite neighborhoods). This may be finished as follows:

import pandas as pd
import numpy as np

# Home costs dataset
df = pd.DataFrame({
   'Neighborhood': ['H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7'],
   'Worth': [50, 300, 360, 390, 420, 450, 1000],
})

# Calculate Q1 (twenty fifth percentile), Q3 (seventy fifth percentile) and Interquartile Vary (IQR)
q1 = df['Price'].quantile(0.25)
q3 = df['Price'].quantile(0.75)
iqr = q3 - q1

# Bounds for outliers
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

# Customized perform to spotlight outliers
def highlight_outliers(val):
   if val  upper_bound:
      return 'background-color: yellow; font-weight: daring; colour: black'
   else:
      return ''

df.type.applymap(highlight_outliers, subset=['Price'])

 

Output:

 

Highlighting OutliersHighlighting Outliers
Highlighting Outliers

 

2. Highlighting Tendencies

Take into account that you simply run an organization and are recording your gross sales each day. To investigate the developments, you need to spotlight the times when your each day gross sales enhance by 5% or extra. You may obtain this utilizing a customized perform and the apply technique in pandas. Right here’s how:

import pandas as pd

# Dataset of Firm's Gross sales
knowledge = {'date': ['2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13', '2024-02-14'],
        'gross sales': [100, 105, 110, 115, 125]}

df = pd.DataFrame(knowledge)

# Every day share change
df['pct_change'] = df['sales'].pct_change() * 100

# Spotlight the day if gross sales elevated by greater than 5%
def highlight_trend(row):
    return ['background-color: green; border: 2px solid black; font-weight: bold' if row['pct_change'] > 5 else '' for _ in row]

df.type.apply(highlight_trend, axis=1)

 

Output:

 

Highlight src=Highlight >5% Increase in Sales
Spotlight >5% Enhance in Gross sales

 

3. Highlighting Correlated Columns

Correlated columns are essential as a result of they present relationships between completely different variables. For instance, if we now have a dataset containing age, revenue, and spending habits and our evaluation exhibits a excessive correlation (near 1) between age and revenue, then it means that older folks usually have greater incomes. Highlighting correlated columns helps to visually establish these relationships. This method turns into extraordinarily useful because the dimensionality of your knowledge will increase. Let's discover an instance to raised perceive this idea:

import pandas as pd

# Dataset of individuals
knowledge = {
    'age': [30, 35, 40, 45, 50],
    'revenue': [60000, 66000, 70000, 75000, 100000],
    'spending': [10000, 15000, 20000, 18000, 12000]
}

df = pd.DataFrame(knowledge)

# Calculate the correlation matrix
corr_matrix = df.corr()

# Spotlight extremely correlated columns
def highlight_corr(val):
    if val != 1.0 and abs(val) > 0.5:   # Exclude self-correlation
        return 'background-color: blue; text-decoration: underline'
    else:
        return ''

corr_matrix.type.applymap(highlight_corr)

 

Output:

 

Correlated ColumnsCorrelated Columns
Correlated Columns

 

Wrapping Up

 

These are simply a number of the examples I confirmed as a starter to up your sport of knowledge visualization. You may apply comparable methods to varied different issues to reinforce the info visualization, akin to highlighting duplicate rows, grouping into classes and choosing completely different formatting for every class, or highlighting peak values. Moreover, there are numerous different CSS choices you possibly can discover within the official documentation. You may even outline completely different properties on hover, like magnifying textual content or altering colour. Take a look at the "Enjoyable Stuff" part for extra cool concepts. This text is a part of my Pandas collection, so in case you loved this, there's a lot extra to discover. Head over to my creator web page for extra ideas, methods, and tutorials.

 
 

Kanwal Mehreen Kanwal is a machine studying engineer and a technical author with a profound ardour for knowledge science and the intersection of AI with medication. She co-authored the book "Maximizing Productiveness with ChatGPT". As a Google Technology Scholar 2022 for APAC, she champions range and educational excellence. She's additionally acknowledged as a Teradata Variety in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower girls in STEM fields.

Leave a Reply