Technique of Moments Estimation with Python Code | by Mahmoud Abdelaziz, PhD | Jan, 2025

The way to perceive and implement the estimator from scratch

Photograph by Petr Macháček on Unsplash

Let’s say you might be in a buyer care middle, and also you want to know the likelihood distribution of the variety of calls per minute, or in different phrases, you wish to reply the query: what’s the likelihood of receiving zero, one, two, … and so on., calls per minute? You want this distribution to be able to predict the likelihood of receiving completely different variety of calls primarily based on which you’ll plan what number of staff are wanted, whether or not or not an enlargement is required, and so on.

With a view to let our determination ‘information knowledgeable’ we begin by accumulating information from which we attempt to infer this distribution, or in different phrases, we wish to generalize from the pattern information to the unseen information which is often known as the inhabitants in statistical phrases. That is the essence of statistical inference.

From the collected information we are able to compute the relative frequency of every worth of calls per minute. For instance, if the collected information over time appears one thing like this: 2, 2, 3, 5, 4, 5, 5, 3, 6, 3, 4, … and so on. This information is obtained by counting the variety of calls obtained each minute. With a view to compute the relative frequency of every worth you’ll be able to rely the variety of occurrences of every worth divided by the overall variety of occurrences. This fashion you’ll find yourself with one thing just like the gray curve within the under determine, which is equal to the histogram of the information on this instance.

Picture generated by the Writer

An alternative choice is to imagine that every information level from our information is a realization of a random variable (X) that follows a sure likelihood distribution. This likelihood distribution represents all of the doable values which might be generated if we had been to gather this information lengthy into the longer term, or in different phrases, we are able to say that it represents the inhabitants from which our pattern information was collected. Moreover, we are able to assume that each one the information factors come from the identical likelihood distribution, i.e., the information factors are identically distributed. Furthermore, we assume that the information factors are impartial, i.e., the worth of 1 information level within the pattern just isn’t affected by the values of the opposite information factors. The independence and an identical distribution (iid) assumption of the pattern information factors permits us to proceed mathematically with our statistical inference drawback in a scientific and easy means. In additional formal phrases, we assume {that a} generative probabilistic mannequin is answerable for producing the iid information as proven under.

Picture generated by the Writer

On this explicit instance, a Poisson distribution with imply worth λ = 5 is assumed to have generated the information as proven within the blue curve within the under determine. In different phrases, we assume right here that we all know the true worth of λ which is usually not recognized and must be estimated from the information.

Picture generated by the Writer

Versus the earlier technique wherein we needed to compute the relative frequency of every worth of calls per minute (e.g., 12 values to be estimated on this instance as proven within the gray determine above), now we solely have one parameter that we purpose at discovering which is λ. One other benefit of this generative mannequin method is that it’s higher when it comes to generalization from pattern to inhabitants. The assumed likelihood distribution might be mentioned to have summarized the information in a chic means that follows the Occam’s razor precept.

Earlier than continuing additional into how we purpose at discovering this parameter λ, let’s present some Python code first that was used to generate the above determine.

# Import the Python libraries that we'll want on this article
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import math
from scipy import stats

# Poisson distribution instance
lambda_ = 5
sample_size = 1000
data_poisson = stats.poisson.rvs(lambda_,measurement= sample_size) # generate information

# Plot the information histogram vs the PMF
x1 = np.arange(data_poisson.min(), data_poisson.max(), 1)
fig1, ax = plt.subplots()
plt.bar(x1, stats.poisson.pmf(x1,lambda_),
label="Possion distribution (PMF)",colour = BLUE2,linewidth=3.0,width=0.3,zorder=2)
ax.hist(data_poisson, bins=x1.measurement, density=True, label="Information histogram",colour = GRAY9, width=1,zorder=1,align='left')

ax.set_title("Information histogram vs. Poisson true distribution", fontsize=14, loc='left')
ax.set_xlabel('Information worth')
ax.set_ylabel('Likelihood')
ax.legend()
plt.savefig("Possion_hist_PMF.png", format="png", dpi=800)

Our drawback now could be about estimating the worth of the unknown parameter λ utilizing the information we collected. That is the place we are going to use the technique of moments (MoM) method that seems within the title of this text.

First, we have to outline what is supposed by the second of a random variable. Mathematically, the kth second of a discrete random variable (X) is outlined as follows

Take the primary second E(X) for example, which can also be the imply μ of the random variable, and assuming that we gather our information which is modeled as N iid realizations of the random variable X. An inexpensive estimate of μ is the pattern imply which is outlined as follows

Thus, to be able to acquire a MoM estimate of a mannequin parameter that parametrizes the likelihood distribution of the random variable X, we first write the unknown parameter as a operate of a number of of the kth moments of the random variable, then we change the kth second with its pattern estimate. The extra unknown parameters we’ve got in our fashions, the extra moments we’d like.

In our Poisson mannequin instance, that is quite simple as proven under

Within the subsequent half, we take a look at our MoM estimator on the simulated information we had earlier. The Python code for acquiring the estimator and plotting the corresponding likelihood distribution utilizing the estimated parameter is proven under.

# Technique of moments estimator utilizing the information (Poisson Dist)
lambda_hat = sum(data_poisson) / len(data_poisson)

# Plot the MoM estimated PMF vs the true PMF
x1 = np.arange(data_poisson.min(), data_poisson.max(), 1)
fig2, ax = plt.subplots()
plt.bar(x1, stats.poisson.pmf(x1,lambda_hat),
label="Estimated PMF",colour = ORANGE1,linewidth=3.0,width=0.3)
plt.bar(x1+0.3, stats.poisson.pmf(x1,lambda_),
label="True PMF",colour = BLUE2,linewidth=3.0,width=0.3)

ax.set_title("Estimated Poisson distribution vs. true distribution", fontsize=14, loc='left')
ax.set_xlabel('Information worth')
ax.set_ylabel('Likelihood')
ax.legend()
#ax.grid()
plt.savefig("Possion_true_vs_est.png", format="png", dpi=800)

The under determine reveals the estimated distribution versus the true distribution. The distributions are fairly shut indicating that the MoM estimator is an inexpensive estimator for our drawback. Actually, changing expectations with averages within the MoM estimator implies that the estimator is a constant estimator by the legislation of enormous numbers, which is an effective justification for utilizing such estimator.

Picture generated by the Writer

One other MoM estimation instance is proven under assuming the iid information is generated by a traditional distribution with imply μ and variance σ² as proven under.

Picture generated by the Writer

On this explicit instance, a Gaussian (regular) distribution with imply worth μ = 10 and σ = 2 is assumed to have generated the information. The histogram of the generated information pattern (pattern measurement = 1000) is proven in gray within the under determine, whereas the true distribution is proven within the blue curve.

Picture generated by the Writer

The Python code that was used to generate the above determine is proven under.

# Regular distribution instance
mu = 10
sigma = 2
sample_size = 1000
data_normal = stats.norm.rvs(loc=mu, scale=sigma ,measurement= sample_size) # generate information

# Plot the information histogram vs the PDF
x2 = np.linspace(data_normal.min(), data_normal.max(), sample_size)
fig3, ax = plt.subplots()
ax.hist(data_normal, bins=50, density=True, label="Information histogram",colour = GRAY9)
ax.plot(x2, stats.norm(loc=mu, scale=sigma).pdf(x2),
label="Regular distribution (PDF)",colour = BLUE2,linewidth=3.0)

ax.set_title("Information histogram vs. true distribution", fontsize=14, loc='left')
ax.set_xlabel('Information worth')
ax.set_ylabel('Likelihood')
ax.legend()
ax.grid()

plt.savefig("Normal_hist_PMF.png", format="png", dpi=800)

Now, we want to use the MoM estimator to search out an estimate of the mannequin parameters, i.e., μ and σ² as proven under.

With a view to take a look at this estimator utilizing our pattern information, we plot the distribution with the estimated parameters (orange) within the under determine, versus the true distribution (blue). Once more, it may be proven that the distributions are fairly shut. After all, to be able to quantify this estimator, we have to take a look at it on a number of realizations of the information and observe properties corresponding to bias, variance, and so on. Such necessary elements have been mentioned in an earlier article Bias Variance Tradeoff in Parameter Estimation with Python Code | by Mahmoud Abdelaziz, PhD | Medium

Picture generated by the Writer

The Python code that was used to estimate the mannequin parameters utilizing MoM, and to plot the above determine is proven under.

# Technique of moments estimator utilizing the information (Regular Dist)
mu_hat = sum(data_normal) / len(data_normal) # MoM imply estimator
var_hat = sum(pow(x-mu_hat,2) for x in data_normal) / len(data_normal) # variance
sigma_hat = math.sqrt(var_hat) # MoM customary deviation estimator

# Plot the MoM estimated PDF vs the true PDF
x2 = np.linspace(data_normal.min(), data_normal.max(), sample_size)
fig4, ax = plt.subplots()
ax.plot(x2, stats.norm(loc=mu_hat, scale=sigma_hat).pdf(x2),
label="Estimated PDF",colour = ORANGE1,linewidth=3.0)
ax.plot(x2, stats.norm(loc=mu, scale=sigma).pdf(x2),
label="True PDF",colour = BLUE2,linewidth=3.0)

ax.set_title("Estimated Regular distribution vs. true distribution", fontsize=14, loc='left')
ax.set_xlabel('Information worth')
ax.set_ylabel('Likelihood')
ax.legend()
ax.grid()
plt.savefig("Normal_true_vs_est.png", format="png", dpi=800)

One other helpful likelihood distribution is the Gamma distribution. An instance for the applying of this distribution in actual life was mentioned in a earlier article. Nevertheless, on this article, we derive the MoM estimator of the Gamma distribution parameters α and β as proven under, assuming the information is iid.

Picture generated by the Writer

On this explicit instance, a Gamma distribution with α = 6 and β = 0.5 is assumed to have generated the information. The histogram of the generated information pattern (pattern measurement = 1000) is proven in gray within the under determine, whereas the true distribution is proven within the blue curve.

Picture generated by the Writer

The Python code that was used to generate the above determine is proven under.

# Gamma distribution instance
alpha_ = 6 # form parameter
scale_ = 2 # scale paramter (lamda) = 1/beta in gamma dist.
sample_size = 1000
data_gamma = stats.gamma.rvs(alpha_,loc=0, scale=scale_ ,measurement= sample_size) # generate information

# Plot the information histogram vs the PDF
x3 = np.linspace(data_gamma.min(), data_gamma.max(), sample_size)
fig5, ax = plt.subplots()
ax.hist(data_gamma, bins=50, density=True, label="Information histogram",colour = GRAY9)
ax.plot(x3, stats.gamma(alpha_,loc=0, scale=scale_).pdf(x3),
label="Gamma distribution (PDF)",colour = BLUE2,linewidth=3.0)

ax.set_title("Information histogram vs. true distribution", fontsize=14, loc='left')
ax.set_xlabel('Information worth')
ax.set_ylabel('Likelihood')
ax.legend()
ax.grid()
plt.savefig("Gamma_hist_PMF.png", format="png", dpi=800)

Now, we want to use the MoM estimator to search out an estimate of the mannequin parameters, i.e., α and β, as proven under.

With a view to take a look at this estimator utilizing our pattern information, we plot the distribution with the estimated parameters (orange) within the under determine, versus the true distribution (blue). Once more, it may be proven that the distributions are fairly shut.

Picture generated by the Writer

The Python code that was used to estimate the mannequin parameters utilizing MoM, and to plot the above determine is proven under.

# Technique of moments estimator utilizing the information (Gamma Dist)
sample_mean = data_gamma.imply()
sample_var = data_gamma.var()
scale_hat = sample_var/sample_mean #scale is the same as 1/beta in gamma dist.
alpha_hat = sample_mean**2/sample_var

# Plot the MoM estimated PDF vs the true PDF
x4 = np.linspace(data_gamma.min(), data_gamma.max(), sample_size)
fig6, ax = plt.subplots()

ax.plot(x4, stats.gamma(alpha_hat,loc=0, scale=scale_hat).pdf(x4),
label="Estimated PDF",colour = ORANGE1,linewidth=3.0)
ax.plot(x4, stats.gamma(alpha_,loc=0, scale=scale_).pdf(x4),
label="True PDF",colour = BLUE2,linewidth=3.0)

ax.set_title("Estimated Gamma distribution vs. true distribution", fontsize=14, loc='left')
ax.set_xlabel('Information worth')
ax.set_ylabel('Likelihood')
ax.legend()
ax.grid()
plt.savefig("Gamma_true_vs_est.png", format="png", dpi=800)

Notice that we used the next equal methods of writing the variance when deriving the estimators within the instances of Gaussian and Gamma distributions.

On this article, we explored numerous examples of the strategy of moments estimator and its purposes in numerous issues in information science. Furthermore, detailed Python code that was used to implement the estimators from scratch in addition to to plot the completely different figures can also be proven. I hope that you can see this text useful.