This post introduced the idea behind Permutation Importance. Here, we will work through an example to further illustrate why permutation importance can give us a measure of feature importance.

Example Dataset¶

We'll construct a toy example where one of our features (x1) has a strong, linear relationship with our outcome variable. The other feature (x2) has no relationship.

In [1]:

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x1 = np.random.random(size=100)
x2 = np.random.random(size=100)
y = 2 * x1 + np.random.normal(scale=0.01, size=100)

df = pd.DataFrame({
    'x1': x1,
    'x2': x2,
    'y': y
})

df.sort_values('x1', inplace=True)
df

Out[1]:

	x1	x2	y
35	0.003929	0.474519	-0.005942
8	0.007594	0.618751	0.005770
19	0.028927	0.424588	0.072574
67	0.034143	0.567468	0.065940
80	0.041604	0.593327	0.082671
...	...	...	...
42	0.944997	0.892642	1.871157
49	0.976107	0.335831	1.945883
73	0.979641	0.730861	1.948719
53	0.983640	0.498213	1.979160
41	0.999919	0.628390	1.996067

100 rows × 3 columns

In [2]:

fig, (ax1, ax2) = plt.subplots(figsize=(10, 6), nrows=1, ncols=2)
df.plot(kind='scatter', x='x1', y='y', ax=ax1)
df.plot(kind='scatter', x='x2', y='y', ax=ax2)
plt.show()

Permuting x1¶

To determine the Permutation Importance, we shuffle one column at a time, and see what impact that has on our ability to predict our target variable.

In this case, we would expect that shuffling x1 would have a large impact because, after permutating the data, x1 no longer has any predictive power.

In [3]:

df_shuffled = df.copy()
df_shuffled['x1'] = np.random.permutation(df['x1'])
df_shuffled

Out[3]:

	x1	x2	y
35	0.678146	0.474519	-0.005942
8	0.676386	0.618751	0.005770
19	0.861932	0.424588	0.072574
67	0.202261	0.567468	0.065940
80	0.043896	0.593327	0.082671
...	...	...	...
42	0.547686	0.892642	1.871157
49	0.049073	0.335831	1.945883
73	0.175270	0.730861	1.948719
53	0.003929	0.498213	1.979160
41	0.222349	0.628390	1.996067

100 rows × 3 columns

Instead of a nice line, we now just have a blob, which is expected because we just randomly shuffled the data.

In [4]:

df_shuffled.plot(x='x1', y='y', kind='scatter')
plt.show()

Train a Model¶

To calculate the Permutation Importance, we must first have a trained model (BEFORE we do the shuffling). Below, we see that our model has an R^2 of 99.7%, which makes sense because, based on the plot of x1 vs y, there is a strong, linear relationship between the two.

(RandomForestRegressor is overkill in this particular case since a Linear model would have worked just as well).

In [5]:

# construct training and validation datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

# create datasets
X = df[['x1', 'x2']]
y = df['y']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)

# train the model
model = RandomForestRegressor()
model.fit(X_train, y_train)

# make predictions on the validation set
predictions = model.predict(X_val)

# evaluate r2
r2 = r2_score(predictions, y_val)
print(f'R^2: {r2}')

R^2: 0.9993164141204842

Identify Important Features¶

Since we have a trained model, we can use eli5 to evaluate the Permutation Importance.

In [6]:

import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(model, scoring='r2').fit(X_val, y_val)
eli5.show_weights(perm, feature_names=X_val.columns.tolist())

Out[6]:

Weight	Feature
1.8616 ± 0.5000	x1
0.0001 ± 0.0001	x2

As expected, x1 comes out as the most important feature.

Comments

Permutation Importance Example

Example Dataset¶

Permuting x1¶

Train a Model¶

Identify Important Features¶

Published

Category

Tags