Lasso and Elastic Internet Regressions

REGRESSION ALGORITHM

Roping in key options with coordinate descent

Linear regression is available in differing types: Least Squares strategies type the inspiration, from the basic Extraordinary Least Squares (OLS) to Ridge regression with its regularization to forestall overfitting. Then there’s Lasso regression, which takes a novel strategy by mechanically choosing vital elements and ignoring others. Elastic Internet combines one of the best of each worlds, mixing Lasso’s function choice with Ridge’s skill to deal with associated options.

It’s irritating to see many articles deal with these strategies as in the event that they’re principally the identical factor with minor tweaks. They make it look like switching between them is so simple as altering a setting in your code, however every truly makes use of totally different approaches to resolve their optimization issues!

Whereas OLS and Ridge regression could be solved instantly by way of matrix operations, Lasso and Elastic Internet require a unique strategy — an iterative technique referred to as coordinate descent. Right here, we’ll discover how this algorithm works by way of clear visualizations. So, let’s saddle up and lasso our manner by way of the small print!

All visuals: Writer-created utilizing Canva Professional. Optimized for cellular; might seem outsized on desktop.

Lasso Regression

LASSO (Least Absolute Shrinkage and Selection Operator) is a variation of Linear Regression that provides a penalty to the mannequin. It makes use of a linear equation to foretell numbers, identical to Linear Regression. Nevertheless, Lasso additionally has a solution to scale back the significance of sure elements to zero, which makes it helpful for 2 major duties: making predictions and figuring out an important options.

Elastic Internet Regression

Elastic Internet Regression is a mixture of Ridge and Lasso Regression that mixes their penalty phrases. The identify “Elastic Internet” comes from physics: identical to an elastic web can stretch and nonetheless maintain its form, this technique adapts to information whereas sustaining construction.

The mannequin balances three targets: minimizing prediction errors, protecting the dimensions of coefficients small (like Lasso), and stopping any coefficient from turning into too giant (like Ridge). To make use of the mannequin, you enter your information’s function values into the linear equation, identical to in commonplace Linear Regression.

The primary benefit of Elastic Internet is that when options are associated, it tends to maintain or take away them as a gaggle as a substitute of randomly choosing one function from the group.

Linear fashions like Lasso and Elastic Internet belong to the broader household of machine studying strategies that predict outcomes utilizing linear relationships between variables.

For instance our ideas, we’ll use our commonplace dataset that predicts the variety of golfers visiting on a given day, utilizing options like climate outlook, temperature, humidity, and wind situations.

For each Lasso and Elastic Internet to work successfully, we have to standardize the numerical options (making their scales comparable) and apply one-hot-encoding to categorical options, as each fashions’ penalties are delicate to function scales.

Columns: ‘Outlook’ (one-hot encoded to sunny, overcast, rain), ‘Temperature’ (standardized), ‘Humidity’ (standardized), ‘Wind’ (Sure/No) and ‘Variety of Gamers’ (numerical, goal function)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

# Create dataset
information = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rain', 'rain', 'rain', 'overcast', 'sunny', 'sunny',
'rain', 'sunny', 'overcast', 'overcast', 'rain', 'sunny', 'overcast', 'rain', 'sunny',
'sunny', 'rain', 'overcast', 'rain', 'sunny', 'overcast', 'sunny', 'overcast', 'rain', 'overcast'],
'Temperature': [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71, 81, 74, 76, 78, 82,
67, 85, 73, 88, 77, 79, 80, 66, 84],
'Humidity': [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80, 88, 92, 85, 75, 92,
90, 85, 88, 65, 70, 60, 95, 70, 78],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False,
True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Num_Players': [52, 39, 43, 37, 28, 19, 43, 47, 56, 33, 49, 23, 42, 13, 33, 29, 25, 51, 41,
14, 34, 29, 49, 36, 57, 21, 23, 41]
}

# Course of information
df = pd.get_dummies(pd.DataFrame(information), columns=['Outlook'])
df['Wind'] = df['Wind'].astype(int)

# Break up information
X, y = df.drop(columns='Num_Players'), df['Num_Players']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Scale numerical options
numerical_cols = ['Temperature', 'Humidity']
ct = ColumnTransformer([('scaler', StandardScaler(), numerical_cols)], the rest='passthrough')

# Rework information
X_train_scaled = pd.DataFrame(
ct.fit_transform(X_train),
columns=numerical_cols + [col for col in X_train.columns if col not in numerical_cols],
index=X_train.index
)

X_test_scaled = pd.DataFrame(
ct.rework(X_test),
columns=X_train_scaled.columns,
index=X_test.index
)

Lasso and Elastic Internet Regression predict numbers by making a straight line (or hyperplane) from the info, whereas controlling the dimensions of coefficients in numerous methods:

  1. Each fashions discover one of the best line by balancing prediction accuracy with coefficient management. They work to make the gaps between actual and predicted values small, whereas protecting coefficients in test by way of penalty phrases.
  2. In Lasso, the penalty (managed by λ) can shrink coefficients to precisely zero, eradicating options totally. Elastic Internet combines two kinds of penalties: one that may take away options (like Lasso) and one other that shrinks teams of associated options collectively. The combo between these penalties is managed by the l1_ratio (α).
  3. To foretell a brand new reply, each fashions multiply every enter by its coefficient (if not zero) and add them up, plus a beginning quantity (intercept/bias). Elastic Internet usually retains extra options than Lasso however with smaller coefficients, particularly when options are correlated.
  4. The power of penalties impacts how the fashions behave:
    – In Lasso, bigger λ means extra coefficients turn out to be zero
    – In Elastic Internet, λ controls general penalty power, whereas α determines the stability between function removing and coefficient shrinkage
    – When penalties are very small, each fashions act extra like commonplace Linear Regression
Lasso and Elastic Internet make predictions by multiplying enter options with their skilled weights and including them along with a bias time period to supply a closing output worth.

Let’s discover how Lasso and Elastic Internet study from information utilizing the coordinate descent algorithm. Whereas these fashions have complicated mathematical foundations, we’ll concentrate on understanding coordinate descent — an environment friendly optimization technique that makes the computation extra sensible and intuitive.

Coordinate Descent for Lasso Regression

The optimization drawback of Lasso Regression is as follows:

Whereas scikit-learn implementation consists of extra scaling elements (1/(2*n_samples)) for computational effectivity, we’ll use the usual theoretical type for readability in our clarification.

Right here’s how coordinate descent finds the optimum coefficients by updating one function at a time:

1. Begin by initializing the mannequin with all coefficients at zero. Set a set worth for the regularization parameter that can management the power of the penalty.

Lasso regression begins with all function weights set to zero and makes use of a penalty parameter (λ) to manage how a lot it shrinks weights throughout coaching.

2. Calculate the preliminary bias by taking the imply of all goal values.

The preliminary bias worth is ready to 37.43, which is calculated by taking the typical of all goal values within the coaching information (imply of participant counts proven from index 0 to 13).

3. For updating the primary coefficient (in our case, ‘sunny’):
– Utilizing weighted sum, calculate what the mannequin would predict with out utilizing this function.

Firstly of coaching, all function weights are set to zero whereas utilizing the preliminary bias of 37.43, inflicting the mannequin to foretell the identical common worth (37.43) for all coaching examples no matter their enter options.

– Discover the partial residual — how far off these predictions are from the precise values. Utilizing this worth, calculate the momentary coefficient.

For the primary function, Lasso calculates a brief coefficient of 11.17 by evaluating the true labels with predictions, contemplating solely the rows the place this function equals 1, and making use of the gradient formulation.

– Apply the Lasso shrinkage (mushy thresholding) to this momentary coefficient to get the ultimate coefficient for this step.

Lasso applies its shrinkage formulation to the momentary coefficient (11.17), the place it subtracts the penalty time period (λ/5 = 0.2) from absolutely the worth whereas preserving the signal, leading to a closing coefficient of 10.97.

4. Transfer by way of every remaining coefficient one after the other, repeating the identical replace course of. When calculating predictions throughout every replace, use probably the most just lately up to date values for all different coefficients.

After updating the primary coefficient to 10.97, Lasso makes use of these up to date predictions to calculate the momentary coefficient (0.32) for the second function, displaying how the algorithm updates coefficients one after the other by way of coordinate descent.
import numpy as np

# Initialize bias as imply of goal values and coefficients to 0
bias = np.imply(y_train)
beta = np.zeros(X_train_scaled.form[1])
lambda_param = 1

# One cycle by way of all options
for j, function in enumerate(X_train_scaled.columns):
# Get present function values
x_j = X_train_scaled.iloc[:, j].values

# Calculate prediction excluding the j-th function
y_pred_no_j = bias + X_train_scaled.values @ beta - x_j * beta[j]

# Calculate partial residuals
residual_no_j = y_train.values - y_pred_no_j

# Calculate the dot product of x_j with itself (sum of squared function values)
sum_squared_x_j = np.dot(x_j, x_j)

# Calculate momentary beta with out regularization (uncooked replace)
beta_old = beta[j]
beta_temp = beta_old + np.dot(x_j, residual_no_j) / sum_squared_x_j

# Apply mushy thresholding for Lasso penalty
beta[j] = np.signal(beta_temp) * max(abs(beta_temp) - lambda_param / sum_squared_x_j, 0)

# Print outcomes
print("Coefficients after one cycle:")
for function, coef in zip(X_train_scaled.columns, beta):
print(f"{function:11}: {coef:.2f}")

5. Return to replace the bias by calculating what the present mannequin predicts utilizing all options, then regulate the bias primarily based on the typical distinction between these predictions and precise values.

After updating all function coefficients by way of coordinate descent, the mannequin recalculates the bias (40.25) because the imply distinction between the true labels and the predictions made utilizing the present function weights, making certain the mannequin’s predictions are correctly centered across the goal values.
# Replace bias (not penalized by lambda)
y_pred = X_train_scaled.values @ beta # solely utilizing coefficients, no bias
residuals = y_train.values - y_pred
bias = np.imply(residuals) # this replaces the previous bias

6. Verify if the mannequin has converged both by reaching the utmost variety of allowed iterations or by seeing that coefficients aren’t altering a lot anymore. If not converged, return to step 3 and repeat the method.

After 1000 iterations of coordinate descent, Lasso produces the ultimate mannequin the place some coefficients have been shrunk precisely to zero (‘rain’ and ‘Temperature’ options), whereas others retain non-zero values, demonstrating Lasso’s function choice functionality.
from sklearn.linear_model import Lasso

# Match Lasso from scikit-learn
lasso = Lasso(alpha=1) # Default worth is 1000 cycle
lasso.match(X_train_scaled, y_train)

# Print outcomes
print("nCoefficients after 1000 cycles:")
print(f"Bias time period : {lasso.intercept_:.2f}")
for function, coef in zip(X_train_scaled.columns, lasso.coef_):
print(f"{function:11}: {coef:.2f}")

Coordinate Descent for Elastic Internet Regression

The optimization drawback of Elastic Internet Regression is as follows:

Whereas scikit-learn’s implementation consists of extra scaling elements (1/(2*n_samples)) and makes use of alpha (α) to manage general regularization power and l1_ratio to manage the penalty combine, we’ll use the usual theoretical type for readability.

The coordinate descent algorithm for Elastic Internet works equally to Lasso, however accounts for each penalties when updating coefficients. Right here’s the way it works:

1. Begin by initializing the mannequin with all coefficients at zero. Set two fastened values: one controlling function removing (like in Lasso) and one other for basic coefficient shrinkage (the important thing distinction from Lasso).

Elastic Internet regression begins like Lasso with zero weights for all options, however makes use of two parameters: λ (lambda) for general regularization power and α (alpha) to stability between Lasso and Ridge penalties.

2. Calculate the preliminary bias by taking the imply of all goal values. (Similar as Lasso)

ike Lasso, Elastic Internet additionally initializes its bias time period to 37.43 by calculating the imply of all goal values within the coaching dataset.

3. For updating the primary coefficient:
– Utilizing weighted sum, calculate what the mannequin would predict with out utilizing this function. (Similar as Lasso)

Elastic Internet begins its coordinate descent course of just like Lasso, making preliminary predictions of 37.43 for all coaching examples since all function weights are set to zero and solely the bias time period is lively.

– Discover the partial residual — how far off these predictions are from the precise values. Utilizing this worth, calculate the momentary coefficient. (Similar as Lasso)

Like Lasso, Elastic Internet calculates a brief coefficient of 11.17 for the primary function by evaluating predictions with true labels.

– For Elastic Internet, apply each mushy thresholding and coefficient shrinkage to this momentary coefficient to get the ultimate coefficient for this step. This mixed impact is the primary distinction from Lasso Regression.

Elastic Internet applies its distinctive shrinkage formulation that mixes each Lasso (L1) and Ridge (L2) penalties, the place α controls their stability. The momentary coefficient 11.17 is shrunk to 10.06 by way of this mixed regularization strategy.

4. Transfer by way of every remaining coefficient one after the other, repeating the identical replace course of. When calculating predictions throughout every replace, use probably the most just lately up to date values for all different coefficients. (Similar course of as Lasso, however utilizing the modified replace formulation)

After updating the primary coefficient to 10.06, Elastic Internet continues coordinate descent by calculating and updating the second coefficient, displaying the way it processes options one after the other whereas sustaining each L1 and L2 regularization results.
import numpy as np

# Initialize bias as imply of goal values and coefficients to 0
bias = np.imply(y_train)
beta = np.zeros(X_train_scaled.form[1])
lambda_param = 1
alpha = 0.5 # mixing parameter (0 for Ridge, 1 for Lasso)

# One cycle by way of all options
for j, function in enumerate(X_train_scaled.columns):
# Get present function values
x_j = X_train_scaled.iloc[:, j].values

# Calculate prediction excluding the j-th function
y_pred_no_j = bias + X_train_scaled.values @ beta - x_j * beta[j]

# Calculate partial residuals
residual_no_j = y_train.values - y_pred_no_j

# Calculate the dot product of x_j with itself (sum of squared function values)
sum_squared_x_j = np.dot(x_j, x_j)

# Calculate momentary beta with out regularization (uncooked replace)
beta_old = beta[j]
beta_temp = beta_old + np.dot(x_j, residual_no_j) / sum_squared_x_j

# Apply mushy thresholding for Elastic Internet penalty
l1_term = alpha * lambda_param / sum_squared_x_j # L1 (Lasso) penalty time period
l2_term = (1-alpha) * lambda_param / sum_squared_x_j # L2 (Ridge) penalty time period

# First apply L1 mushy thresholding, then L2 scaling
beta[j] = (np.signal(beta_temp) * max(abs(beta_temp) - l1_term, 0)) / (1 + l2_term)

# Print outcomes
print("Coefficients after one cycle:")
for function, coef in zip(X_train_scaled.columns, beta):
print(f"{function:11}: {coef:.2f}")

5. Replace the bias by calculating what the present mannequin predicts utilizing all options, then regulate the bias primarily based on the typical distinction between these predictions and precise values. (Similar as Lasso)

After updating all function coefficients utilizing Elastic Internet’s mixed L1 and L2 regularization, the mannequin recalculates the bias to 40.01 by taking the imply distinction between true labels and predictions, just like the method in Lasso regression.
# Replace bias (not penalized by lambda)
y_pred_with_updated_beta = X_train_scaled.values @ beta # solely utilizing coefficients, no bias
residuals_for_bias_update = y_train.values - y_pred_with_updated_beta
new_bias = np.imply(y_train.values - y_pred_with_updated_beta) # this replaces the previous bias

print(f"Bias time period : {new_bias:.2f}")

6. Verify if the mannequin has converged both by reaching the utmost variety of allowed iterations or by seeing that coefficients aren’t altering a lot anymore. If not converged, return to step 3 and repeat the method.

The ultimate Elastic Internet mannequin after 1000 iterations exhibits smaller coefficient values in comparison with Lasso and fewer coefficients shrunk precisely to zero.
from sklearn.linear_model import ElasticNet

# Match Lasso from scikit-learn
elasticnet = Lasso(alpha=1) # Default worth is 1000 cycle
elasticnet.match(X_train_scaled, y_train)

# Print outcomes
print("nCoefficients after 1000 cycles:")
print(f"Bias time period : {elasticnet.intercept_:.2f}")
for function, coef in zip(X_train_scaled.columns, elasticnet.coef_):
print(f"{function:11}: {coef:.2f}")

The prediction course of stays the identical as OLS — multiply new information factors by the coefficients:

Lasso Regression

When making use of the skilled Lasso mannequin to unseen information, it multiplies every function worth with its corresponding coefficient and provides the bias time period (41.24), leading to a closing prediction of 40.2 gamers for this new information level.

Elastic Internet Regression

The skilled Elastic Internet mannequin predicts 40.83 gamers for a similar unseen information level by multiplying options with its extra evenly distributed coefficients and including the bias (38.59), displaying a barely totally different prediction from Lasso because of its balanced regularization strategy.

We will do the identical course of for all information factors. For our dataset, right here’s the ultimate outcome with the RMSE as effectively:

Lasso Regression

Lasso’s efficiency on a number of check circumstances exhibits a Root Imply Sq. Error (RMSE) of seven.203, calculated by evaluating its predictions with precise participant counts throughout 14 totally different check samples.

Elastic Internet Regression

Elastic Internet exhibits a barely larger RMSE in comparison with Lasso’s, probably as a result of its mixed L1 and L2 penalties maintain extra options with small non-zero coefficients, which may introduce extra variance in predictions.

Lasso Regression

Lasso regression makes use of coordinate descent to resolve the optimization drawback. Listed here are the important thing parameters for that:

  • alpha (λ): Controls how strongly to penalize giant coefficients. Increased values drive extra coefficients to turn out to be precisely zero. Default is 1.0.
  • max_iter: Units the utmost variety of cycles the algorithm will replace its answer in the hunt for one of the best outcome. Default is 1000.
  • tol: Units how small the change in coefficients must be earlier than the algorithm decides it has discovered a adequate answer. Default is 0.0001.

Elastic Internet Regression

Elastic Internet regression combines two kinds of penalties and likewise makes use of coordinate descent. Listed here are the important thing parameters for that:

  • alpha (λ): Controls the general power of each penalties collectively. Increased values imply stronger penalties. Default is 1.0.
  • l1_ratio (α): Units how a lot to make use of every sort of penalty. A worth of 0 makes use of solely Ridge penalty, whereas 1 makes use of solely Lasso penalty. Values between 0 and 1 use each. Default is 0.5.
  • max_iter: Most variety of iterations for the coordinate descent algorithm. Default is 1000 iterations.
  • tol: Tolerance for the optimization convergence, just like Lasso. Default is 1e-4.

Notice: To not be confused, in scikit-learn’s code, the regularization parameter is named alpha, however in mathematical notation it’s usually written as λ (lambda). Equally, the blending parameter is named l1_ratio in code however written as α (alpha) in mathematical notation. We use the mathematical symbols right here to match commonplace textbook notation.

With Elastic Internet, we are able to truly discover several types of linear regression fashions by adjusting the parameters:

  • When alpha = 0, we get Extraordinary Least Squares (OLS)
  • When alpha > 0 and l1_ratio = 0, we get Ridge regression
  • When alpha > 0 and l1_ratio = 1, we get Lasso regression
  • When alpha > 0 and 0 < l1_ratio < 1, we get Elastic Internet regression

In apply, it’s a good suggestion to discover a variety of alpha values (like 0.0001, 0.001, 0.01, 0.1, 1, 10, 100) and l1_ratio values (like 0, 0.25, 0.5, 0.75, 1), ideally utilizing cross-validation to search out one of the best mixture.

Right here, let’s see how the mannequin coefficients, bias phrases, and check RMSE change with totally different regularization strengths (λ) and mixing parameters (l1_ratio).

The most effective mannequin is Lasso (α = 0) with λ = 0.1, attaining an RMSE of 6.561, displaying that pure L1 regularization works finest for our dataset.
# Outline parameters
l1_ratios = [0, 0.25, 0.5, 0.75, 1]
lambdas = [0, 0.01, 0.1, 1, 10]
feature_names = X_train_scaled.columns

# Create a dataframe for every lambda worth
for lambda_val in lambdas:
# Initialize checklist to retailer outcomes
outcomes = []
rmse_values = []

# Match ElasticNet for every l1_ratio
for l1_ratio in l1_ratios:
# Match mannequin
en = ElasticNet(alpha=lambda_val, l1_ratio=l1_ratio)
en.match(X_train_scaled, y_train)

# Calculate RMSE
y_pred = en.predict(X_test_scaled)
rmse = root_mean_squared_error(y_test, y_pred)

# Retailer coefficients and RMSE
outcomes.append(checklist(en.coef_.spherical(2)) + [round(en.intercept_,2),round(rmse,3)])

# Create dataframe with RMSE column
columns = checklist(feature_names) + ['Bias','RMSE']
df = pd.DataFrame(outcomes, index=l1_ratios, columns=columns)
df.index.identify = f'λ = {lambda_val}'

print(df)

Notice: Though Elastic Internet can do what OLS, Ridge, and Lasso do by altering its parameters, it’s higher to make use of the particular command made for every sort of regression. In scikit-learn, use LinearRegression for OLS, Ridge for Ridge regression, and Lasso for Lasso regression. Solely use Elastic Internet if you need to mix each Lasso and Ridge’s particular options collectively.

Let’s break down when to make use of every technique.

Begin with Extraordinary Least Squares (OLS) when you have got extra samples than options in your dataset, and when your options don’t strongly predict one another.

Ridge Regression works effectively when you have got the other scenario — plenty of options in comparison with your variety of samples. It’s additionally nice when your options are strongly related to one another.

Lasso Regression is finest if you need to uncover which options truly matter to your predictions. It is going to mechanically set unimportant options to zero, making your mannequin less complicated.

Elastic Internet combines the strengths of each Ridge and Lasso. It’s helpful when you have got teams of associated options and need to both maintain or take away them collectively. When you’ve tried Ridge and Lasso individually and weren’t pleased with the outcomes, Elastic Internet would possibly provide you with higher predictions.

technique is to start out with Ridge if you wish to maintain all of your options. You may transfer on to Lasso if you wish to determine the vital ones. If neither offers you good outcomes, then transfer on to Elastic Internet.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.metrics import root_mean_squared_error
from sklearn.linear_model import Lasso #, ElasticNet

# Create dataset
information = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rain', 'rain', 'rain', 'overcast', 'sunny', 'sunny',
'rain', 'sunny', 'overcast', 'overcast', 'rain', 'sunny', 'overcast', 'rain', 'sunny',
'sunny', 'rain', 'overcast', 'rain', 'sunny', 'overcast', 'sunny', 'overcast', 'rain', 'overcast'],
'Temperature': [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71, 81, 74, 76, 78, 82,
67, 85, 73, 88, 77, 79, 80, 66, 84],
'Humidity': [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80, 88, 92, 85, 75, 92,
90, 85, 88, 65, 70, 60, 95, 70, 78],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False,
True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Num_Players': [52, 39, 43, 37, 28, 19, 43, 47, 56, 33, 49, 23, 42, 13, 33, 29, 25, 51, 41,
14, 34, 29, 49, 36, 57, 21, 23, 41]
}

# Course of information
df = pd.get_dummies(pd.DataFrame(information), columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df = df[['sunny','overcast','rain','Temperature','Humidity','Wind','Num_Players']]

# Break up information
X, y = df.drop(columns='Num_Players'), df['Num_Players']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Scale numerical options
numerical_cols = ['Temperature', 'Humidity']
ct = ColumnTransformer([('scaler', StandardScaler(), numerical_cols)], the rest='passthrough')

# Rework information
X_train_scaled = pd.DataFrame(
ct.fit_transform(X_train),
columns=numerical_cols + [col for col in X_train.columns if col not in numerical_cols],
index=X_train.index
)
X_test_scaled = pd.DataFrame(
ct.rework(X_test),
columns=X_train_scaled.columns,
index=X_test.index
)

# Initialize and prepare the mannequin
mannequin = Lasso(alpha=0.1) # Choice 1: Lasso Regression (alpha is the regularization power, equal to λ, makes use of coordinate descent)
#mannequin = ElasticNet(alpha=0.1, l1_ratio=0.5) # Choice 2: Elastic Internet Regression (alpha is the general regularization power, and l1_ratio is the combination between L1 and L2, makes use of coordinate descent)

# Match the mannequin
mannequin.match(X_train_scaled, y_train)

# Make predictions
y_pred = mannequin.predict(X_test_scaled)

# Calculate and print RMSE
rmse = root_mean_squared_error(y_test, y_pred)
print(f"RMSE: {rmse:.4f}")

# Extra details about the mannequin
print("nModel Coefficients:")
for function, coef in zip(X_train_scaled.columns, mannequin.coef_):
print(f"{function:13}: {coef:.2f}")
print(f"Intercept : {mannequin.intercept_:.2f}")