Constructing Efficient Metrics to Describe Customers | by Vladimir Zhyvov | Jan, 2025

Think about you might be an e-commerce platform aiming to personalize your e-mail campaigns primarily based on consumer exercise from the previous week. If a consumer has been much less lively in comparison with earlier weeks, you intend to ship them a reduction provide.

You’ve gathered consumer statistics and observed the next for a consumer named John:

  • John visited the platform for the primary time 15 days in the past.
  • Throughout the first 7 days (days 1–7), he made 9 visits.
  • Throughout the subsequent 7 days (days 2–8), he made 8 visits.
  • Completely we’ve 9 values.

Now, you wish to consider how excessive the newest worth is in comparison with the earlier ones.

import numpy as np
visits = np.array([9, 8, 6, 5, 8, 6, 8, 7])
num_visits_last_week = 6

Let’s create a CDF of those values.

import numpy as np
import matplotlib.pyplot as plt

values = np.array(sorted(set(visits)))
counts = np.array([data.count(x) for x in values])
possibilities = counts / counts.sum()
cdf = np.cumsum(possibilities)

plt.scatter(values, cdf, coloration='black', linewidth=10)

CDF, picture by Creator

Now we have to restore the operate primarily based on these values. We’ll use spline interpolation.

from scipy.interpolate import make_interp_spline

x_new = np.linspace(values.min(), values.max(), 300)
spline = make_interp_spline(values, cdf, okay=3)
cdf_smooth = spline(x_new)

plt.plot(x_new, cdf_smooth, label='Сплайн CDF', coloration='black', linewidth=4)
plt.scatter(values, cdf, coloration='black', linewidth=10)
plt.scatter(values[-2:], cdf[-2:], coloration='#f95d5f', linewidth=10, zorder=5)
plt.present()

CDF with spline interpolation, picture by Creator

Not unhealthy. However we observe a small drawback between purple dots — the CDF should be monotonically rising. Let’s repair this with Piecewise Cubic Hermite Interpolating Polynomial.

from scipy.interpolate import PchipInterpolator

spline_monotonic = PchipInterpolator(values, cdf)
cdf_smooth = spline_monotonic(x_new)

plt.plot(x_new, cdf_smooth, coloration='black', linewidth=4)
plt.scatter(values, cdf, coloration='black', linewidth=10)
plt.present()

CDF with Piecewise Cubic Hermite Interpolating, picture by Creator

Alright, now it’s good.

To calculate p-value for our present statement (6 visits over the last week) we have to calculate the floor of crammed space.

Crucial space, picture by Creator

To take action let’s create a easy operate calculate_p_value:

def calculate_p_value(x):
if x < values.min():
return 0
elif x > values.max():
return 1
else:
return spline_monotonic(x)

p_value = calculate_p_value(num_visits_last_week)
print(f"Chance of getting lower than {num_visits_last_week} equals: {p_value}")

Chance of getting lower than 6 equals: 0.375

So the likelihood is sort of excessive (we could evaluate it to a threshold 0.1 as an illustration) and we resolve to not ship the low cost to John. Identical calculation we have to do for all of the customers.