To group by multiple columns and then find the variance of rows in a pandas DataFrame, you can use the groupby() and var() functions.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age"].var().rename('age_variance').reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender  age_variance
0         cat      F      9.333333
1         cat      M           NaN
2         dog      F      2.000000
3         dog      M      4.333333

When working with data, it is very useful to be able to group and aggregate data by multiple columns to understand the various segments of our data.

One such case is if you want to group your data and get the variance of a variable for each group.

To get the variance of a variable by groups of columns in a pandas DataFrame, you can use the groupby() and var() functions.

Below is a simple example showing you how you can group by and then get the variance of a variable of each group in a pandas DataFrame in Python.

In the example below, I’ve renamed the variance of rows to ‘age_variance’ and then reset the index so that we can work with the resulting DataFrame easier.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age"].var().rename('age_variance').reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender  age_variance
0         cat      F      9.333333
1         cat      M           NaN
2         dog      F      2.000000
3         dog      M      4.333333

Using groupby() and var() on Single Column in pandas DataFrame

You can use groupby() to group a pandas DataFrame by one column or multiple columns.

If you want to group a pandas DataFrame by one column and then get the variance of a single variable in each group with var(), you can do the following.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type"])["age"].var().rename('age_variance').reset_index())

#Output:
  animal_type gender
0         dog      F
1         cat      F
2         dog      F
3         cat      F
4         dog      M
5         dog      M
6         cat      M
7         cat      F
8         dog      M

  animal_type  age_variance
0         cat      7.583333
1         dog      9.200000

If you want to group by a single column and find the variance of multiple variables, you can do the following. In this case, the column names will be the names of the original columns.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["gender"])["age","weight"].var().reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  gender       age  weight
0      F  7.300000    55.0
1      M  2.916667   175.0

Using groupby() to Group By Multiple Columns and var() in pandas DataFrame

If you want to group a pandas DataFrame by multiple columns and then get the variance of a variable for each group with var(), you can do the following.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], "gender":["F","F","F","F","M","M","M","F","M"], "age":[1,2,3,4,5,6,7,8,9], "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age"].var().rename('age_variance').reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender  age_variance
0         cat      F      9.333333
1         cat      M           NaN
2         dog      F      2.000000
3         dog      M      4.333333

If you want to group by multiple columns and find the variance of multiple variables, you can do the following. In this case, the column names will be the names of the original columns.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], "gender":["F","F","F","F","M","M","M","F","M"], "age":[1,2,3,4,5,6,7,8,9], "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age","weight"].var().reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender       age      weight
0         cat      F  9.333333   33.333333
1         cat      M       NaN         NaN
2         dog      F  2.000000   12.500000
3         dog      M  4.333333  225.000000

Hopefully this article has been useful for you to learn how to group by and variance in pandas with groupby() and var().

Categorized in:

Python,

Last Update: March 22, 2024