Predicted Chance | In the direction of Information Science

MODEL EVALUATION & OPTIMIZATION

7 fundamental classifiers reveal their prediction confidence math

Classification fashions don’t simply inform you what they assume the reply is — in addition they inform you how certain they’re about that reply. This certainty is proven as a chance rating. A excessive rating means the mannequin may be very assured, whereas a low rating means it’s unsure about its prediction.

Each classification mannequin calculates these chance scores otherwise. Easy fashions and sophisticated ones every have their very own particular strategies to find out the probability of every potential end result.

We’re going to discover seven fundamental classification fashions and visually break down how every one figures out its chance scores. No want for a crystal ball — we’ll make these chance calculations crystal clear!

All visuals: Creator-created utilizing Canva Professional. Optimized for cellular; might seem outsized on desktop.

Predicted chance (or “class chance”) is a quantity from 0 to 1 (or 0% to 100%) that reveals how assured a mannequin is about its reply. If the quantity is 1, the mannequin is totally certain about its reply. If it’s 0.5, the mannequin is principally guessing — it’s like flipping a coin.

Parts of a Chance Rating

When a mannequin has to decide on between two courses (known as binary classification), three predominant guidelines apply:

  1. The anticipated chance should be between 0 and 1
  2. The probabilities of each choices occurring should add as much as 1
  3. A better chance means the mannequin is extra certain about its alternative

For binary classification, after we discuss predicted chance, we normally imply the chance of the optimistic class. A better chance means the mannequin thinks the optimistic class is extra probably, whereas a decrease chance means it thinks the unfavourable class is extra probably.

To ensure these guidelines are adopted, fashions use mathematical capabilities to transform their calculations into correct possibilities. Every sort of mannequin may use completely different capabilities, which impacts how they categorical their confidence ranges.

In classification, a mannequin picks the category it thinks will most definitely occur — the one with the very best chance rating. However two completely different fashions may choose the identical class whereas being roughly assured about it. Their predicted chance scores inform us how certain every mannequin is, even once they make the identical alternative.

These completely different chance scores inform us one thing essential: even when fashions choose the identical class, they could perceive the info otherwise.

One mannequin is likely to be very certain about its alternative, whereas one other is likely to be much less assured — regardless that they made the identical prediction.

To know how predicted chance is calculated, we’ll proceed with the identical dataset utilized in my earlier articles on Classification Algorithms. Our objective stays: predicting if somebody will play golf primarily based on the climate.

Columns: ‘Overcast (one-hot-encoded into 3 columns)’, ’Temperature’ (in Fahrenheit), ‘Humidity’ (in %), ‘Windy’ (Sure/No) and ‘Play’ (Sure/No, goal characteristic)
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Create and put together dataset
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast',
'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy',
'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast',
'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0,
72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0,
88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0,
90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0,
65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True,
True, False, True, True, False, False, True, False, True, True, False,
True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',
'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}

# Put together information
df = pd.DataFrame(dataset_dict)

As some algorithms may want standardized values, we may even do commonplace scaling to the numerical options and one-hot encoding to the explicit options, together with the goal characteristic:

from sklearn.preprocessing import StandardScaler
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Sure').astype(int)

# Rearrange columns
column_order = ['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind', 'Play']
df = df[column_order]

# Put together options and goal
X,y = df.drop('Play', axis=1), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Scale numerical options
scaler = StandardScaler()
X_train[['Temperature', 'Humidity']] = scaler.fit_transform(X_train[['Temperature', 'Humidity']])
X_test[['Temperature', 'Humidity']] = scaler.remodel(X_test[['Temperature', 'Humidity']])

Now, let’s see how every of the next 7 classification algorithms calculates these possibilities:

A Dummy Classifier is a prediction mannequin that doesn’t study patterns from information. As an alternative, it follows fundamental guidelines like: selecting the commonest end result, making random predictions primarily based on how usually every end result appeared in coaching, at all times selecting one reply, or randomly selecting between choices with equal probability. The Dummy Classifier ignores all enter options and simply follows these guidelines.

When this mannequin finishes coaching, all it remembers is a couple of numbers exhibiting both how usually every end result occurred or the fixed values it was instructed to make use of. It doesn’t study something about how options relate to outcomes.

For calculating predicted chance in binary classification, the Dummy Classifier makes use of essentially the most fundamental strategy potential. Because it solely remembered how usually every end result appeared within the coaching information, it makes use of these similar numbers as chance scores for each prediction — both 0 or 1.

These chance scores keep precisely the identical for all new information, as a result of the mannequin doesn’t take a look at or react to any options of the brand new information it’s making an attempt to foretell.

from sklearn.dummy import DummyClassifier
import pandas as pd
import numpy as np

# Prepare the mannequin
dummy_clf = DummyClassifier(technique='stratified', random_state=42)
dummy_clf.match(X_train, y_train)

# Print the "mannequin" - which is simply the category possibilities
print("THE MODEL:")
print(f"Chance of not taking part in (class 0): {dummy_clf.class_prior_[0]:.3f}")
print(f"Chance of taking part in (class 1): {dummy_clf.class_prior_[1]:.3f}")
print("nNOTE: These possibilities are used for ALL predictions, no matter enter options!")

# Make predictions and get possibilities
y_pred = dummy_clf.predict(X_test)
y_prob = dummy_clf.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

Okay-Nearest Neighbors (kNN) is a prediction mannequin that takes a unique strategy — as an alternative of studying guidelines, it retains all coaching examples in reminiscence. When it must make a prediction about new information, it measures how comparable this information is to each saved instance, finds the ok most comparable ones (the place ok is a quantity we select), and makes its resolution primarily based on these neighbors.

When this mannequin finishes coaching, all it has saved is the whole coaching dataset, the worth of ok we selected, and a technique for measuring how comparable two information factors are (by default utilizing Euclidean distance).

For calculating predicted chance, kNN seems at these ok most comparable examples and counts what number of belong to every class. The chance rating is just the variety of neighbors belonging to a category divided by ok.

Since kNN calculates chance scores by division, it may well solely give sure particular values primarily based on ok (say, for ok=5, the one potential chance scores are 0/5 (0%), 1/5 (20%), 2/5 (40%), 3/5 (60%), 4/5 (80%), and 5/5 (100%)). This implies kNN can’t give as many various confidence ranges as different fashions.

from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np

# Prepare the mannequin
ok = 3 # variety of neighbors
knn = KNeighborsClassifier(n_neighbors=ok)
knn.match(X_train, y_train)

# Print the "mannequin"
print("THE MODEL:")
print(f"Variety of neighbors (ok): {ok}")
print(f"Coaching information factors saved: {len(X_train)}")

# Make predictions and get possibilities
y_pred = knn.predict(X_test)
y_prob = knn.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

Naive Bayes is a prediction mannequin that makes use of chance math with a “naive” rule: it assumes every characteristic impacts the result independently. There are several types of Naive Bayes: Gaussian Naive Bayes works with steady values, whereas Bernoulli Naive Bayes works with binary options. As our dataset has many 0–1 options, we’ll give attention to the Bernoulli one right here.

When this mannequin finishes coaching, it remembers chance values: one worth for the way usually the optimistic class happens, and for every characteristic, values exhibiting how probably completely different characteristic values seem when we now have a optimistic end result.

For calculating predicted chance, Naive Bayes multiplies a number of possibilities collectively: the prospect of every class occurring, and the prospect of seeing every characteristic worth inside that class. These multiplied possibilities are then normalized in order that they sum to 1, giving us the ultimate chance scores.

Since Naive Bayes makes use of chance math, its chance scores naturally fall between 0 and 1. Nonetheless, when sure options strongly level to 1 class over one other, the mannequin may give chance scores very near 0 or 1, exhibiting it’s very assured about its prediction.

from sklearn.naive_bayes import BernoulliNB
import pandas as pd

# Prepare the mannequin
nb = BernoulliNB()
nb.match(X_train, y_train)

# Print the "mannequin"
print("THE MODEL:")
df = pd.DataFrame(
nb.feature_log_prob_.T,
columns=['Log Prob (No Play)', 'Log Prob (Play)'],
index=['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind']
)
df = df.spherical(3)
print("nFeature Log-Chances:")
print(df)

print("nClass Priors:")
priors = pd.Collection(nb.class_log_prior_, index=['No Play', 'Play']).spherical(3)
print(priors)

# Make predictions and get possibilities
y_pred = nb.predict(X_test)
y_prob = nb.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

A Choice Tree Classifier works by making a sequence of sure/no questions concerning the enter information. It builds these questions one after the other, at all times selecting essentially the most helpful query that finest separates the info into teams. It retains asking questions till it reaches a remaining reply on the finish of a department.

When this mannequin finishes coaching, it has created a tree the place every level represents a query concerning the information. Every department reveals which technique to go primarily based on the reply, and on the finish of every department is details about how usually every class appeared within the coaching information.

For calculating predicted chance, the Choice Tree follows all its questions for brand spanking new information till it reaches the top of a department. The chance rating relies on what number of coaching examples of every class ended up at that very same department throughout coaching.

Since Choice Tree chance scores come from counting coaching examples at every department endpoint, they’ll solely be sure values that had been seen throughout coaching. This implies the mannequin can solely give chance scores that match the patterns it discovered whereas studying, which limits how exact its confidence ranges could be.

from sklearn.tree import DecisionTreeClassifier, plot_tree
import pandas as pd
import matplotlib.pyplot as plt

# Prepare the mannequin
dt = DecisionTreeClassifier(random_state=42, max_depth=3) # limiting depth for visibility
dt.match(X_train, y_train)

# Print the "mannequin" - visualize the choice tree
print("THE MODEL (DECISION TREE STRUCTURE):")
plt.determine(figsize=(20,10))
plot_tree(dt, feature_names=['sunny', 'overcast', 'rainy', 'Temperature',
'Humidity', 'Wind'],
class_names=['No Play', 'Play'],
stuffed=True, rounded=True, fontsize=10)
plt.present()

# Make predictions and get possibilities
y_pred = dt.predict(X_test)
y_prob = dt.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

A Logistic Regression mannequin, regardless of its title, predicts between two courses utilizing a mathematical equation. For every characteristic within the enter information, it learns how essential that characteristic is by giving it a quantity (weight). It additionally learns one additional quantity (bias) that helps make higher predictions. To show these numbers right into a predicted chance, it makes use of the sigmoid perform that retains the ultimate reply between 0 and 1.

When this mannequin finishes coaching, all it remembers is these weights — one quantity for every characteristic, plus the bias quantity. These numbers are all it must make predictions.

For calculating predicted chance in binary classification, Logistic Regression first multiplies every characteristic worth by its weight and provides all of them collectively, plus the bias. This sum might be any quantity, so the mannequin makes use of the sigmoid perform to transform it right into a chance between 0 and 1.

Not like different fashions that may solely give sure particular chance scores, Logistic Regression may give any chance between 0 and 1. The additional the enter information is from the purpose the place the mannequin switches from one class to a different (the choice boundary), the nearer the chance will get to both 0 or 1. Information factors close to this switching level get possibilities nearer to 0.5, exhibiting the mannequin is much less assured about these predictions.

from sklearn.linear_model import LogisticRegression
import pandas as pd

# Prepare the mannequin
lr = LogisticRegression(random_state=42)
lr.match(X_train, y_train)

# Print the "mannequin"
print("THE MODEL:")
model_df = pd.DataFrame({
'Function': ['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind'],
'Coefficient': lr.coef_[0]
})
model_df['Coefficient'] = model_df['Coefficient'].spherical(3)
print("Coefficients (weights):")
print(model_df)

print(f"nIntercept (bias): {lr.intercept_[0]:.3f}")
print("nPrediction = sigmoid(intercept + sum(coefficient * feature_value))")

# Make predictions and get possibilities
y_pred = lr.predict(X_test)
y_prob = lr.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

A Assist Vector Machine (SVM) Classifier works by discovering the most effective boundary line (or floor) that separates completely different courses. It focuses on the factors closest to this boundary (known as assist vectors). Whereas the essential SVM finds straight boundary traces, it may well additionally create curved boundaries utilizing mathematical capabilities known as kernels.

When this mannequin finishes coaching, it remembers three issues: the details close to the boundary (assist vectors), how a lot every level issues (weights), and any settings for curved boundaries (kernel parameters). Collectively, these outline the place and the way the boundary separates the courses.

For calculating predicted chance in binary classification, SVM wants an additional step as a result of it wasn’t designed to offer chance scores. It makes use of a technique known as Platt Scaling, which provides a Logistic Regression layer to transform distances from the boundary into possibilities. These distances undergo the sigmoid perform to get remaining chance scores.

Since SVM calculates possibilities this oblique approach, the scores present how far factors are from the boundary moderately than true confidence ranges. Factors removed from the boundary get chance scores nearer to 0 or 1, whereas factors close to the boundary get scores nearer to 0.5. This implies the chance scores are extra about location relative to the boundary than the mannequin’s precise confidence in its predictions.

from sklearn.svm import SVC
import pandas as pd
import numpy as np

# Prepare the mannequin
svm = SVC(kernel='rbf', chance=True, random_state=42)
svm.match(X_train, y_train)

# Print the "mannequin"
print("THE MODEL:")
print(f"Kernel: {svm.kernel}")
print(f"Variety of assist vectors: {svm.n_support_}")
print("nSupport Vectors (exhibiting first 5 rows):")

# Create dataframe of assist vectors
sv_df = pd.DataFrame(
svm.support_vectors_,
columns=['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind']
)
print(sv_df.head().spherical(3))

# Present which courses these assist vectors belong to
print("nSupport vector courses:")
for i, rely in enumerate(svm.n_support_):
print(f"Class {i}: {rely} assist vectors")

# Make predictions and get possibilities
y_pred = svm.predict(X_test)
y_prob = svm.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

A Multi-Layer Perceptron (MLP) Classifier is a sort of neural community that processes information by a number of layers of related nodes (neurons). Every neuron calculates a weighted whole of its inputs, transforms this quantity utilizing a perform (like ReLU), and sends the end result to the subsequent layer. For binary classification, the final layer makes use of the sigmoid perform to offer an output between 0 and 1.

When this mannequin finishes coaching, it remembers two predominant issues: the connection strengths (weights and biases) between neurons in neighboring layers, and the way the community is structured (what number of layers and neurons are in every layer).

For calculating predicted chance in binary classification, the MLP strikes information by its layers, with every layer creating extra advanced mixtures of data from the earlier layer. The ultimate layer produces a quantity that the sigmoid perform converts right into a chance between 0 and 1.

The MLP can discover extra advanced patterns in information than many different fashions as a result of it combines options in superior methods. The ultimate chance rating reveals how assured the community is — scores near 0 or 1 imply the community may be very assured about its prediction, whereas scores close to 0.5 point out it’s unsure.

from sklearn.neural_network import MLPClassifier
import pandas as pd
import numpy as np

# Prepare the mannequin with a easy structure
mlp = MLPClassifier(hidden_layer_sizes=(4,2), random_state=42)
mlp.match(X_train, y_train)

# Print the "mannequin"
print("THE MODEL:")
print("Community Structure:")
print(f"Enter Layer: {mlp.n_features_in_} neurons (options)")
for i, layer_size in enumerate(mlp.hidden_layer_sizes):
print(f"Hidden Layer {i+1}: {layer_size} neurons")
print(f"Output Layer: {mlp.n_outputs_} neurons (courses)")

# Present weights for first hidden layer
print("nWeights from Enter to First Hidden Layer:")
weights_df = pd.DataFrame(
mlp.coefs_[0],
columns=[f'Hidden_{i+1}' for i in range(mlp.hidden_layer_sizes[0])],
index=['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind']
)
print(weights_df.spherical(3))

print("nNote: Further weights and biases exist between subsequent layers")

# Make predictions and get possibilities
y_pred = mlp.predict(X_test)
y_prob = mlp.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

To summarize, right here’s how every classifier calculates predicted possibilities:

  1. Dummy Classifier: Makes use of the identical chance scores for all predictions, primarily based solely on how usually every class appeared in coaching. Ignores all enter options.
  2. Okay-Nearest Neighbors: The chance rating is the fraction of comparable neighbors belonging to every class. Can solely give particular fractions primarily based on ok (like 3/5 or 7/10).
  3. Naive Bayes: Multiplies collectively the preliminary class chance and possibilities of seeing every characteristic worth, then adjusts the outcomes so as to add as much as 1. Chance scores present how probably options are to look in every class.
  4. Choice Tree: Offers chance scores primarily based on how usually every class appeared within the remaining branches. Can solely use chance values that it noticed throughout coaching.
  5. Logistic Regression: Makes use of the sigmoid perform to transform weighted characteristic mixtures into chance scores. May give any chance between 0 and 1, altering easily primarily based on distance from the choice boundary.
  6. Assist Vector Machine: Wants an additional step (Platt Scaling) to create chance scores, utilizing the sigmoid perform to transform distances from the boundary. These distances decide how assured the mannequin is.
  7. Multi-Layer Perceptron: Processes information by a number of layers of transformations, ending with the sigmoid perform. Creates chance scores from advanced characteristic mixtures, giving any worth between 0 and 1.

Taking a look at how every mannequin calculates its predicted chance reveals us one thing essential: every mannequin has its personal approach of exhibiting how assured it’s. Some fashions just like the Dummy Classifier and Choice Tree can solely use sure chance scores primarily based on their coaching information. Others like Logistic Regression and Neural Networks may give any chance between 0 and 1, letting them be extra exact about their uncertainty.

Right here’s what’s fascinating: regardless that all these fashions give us numbers between 0 and 1, these numbers imply various things for every mannequin. Some get their scores by easy counting, others by measuring distance from a boundary, and a few by advanced calculations with options. This implies a 70% chance from one mannequin tells us one thing fully completely different than a 70% from one other mannequin.

When selecting a mannequin to make use of, look past simply accuracy. Take into consideration whether or not the best way it calculates predicted chance is sensible to your particular wants.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# The fashions
from sklearn.dummy import DummyClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

# Load and put together information
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Sure').astype(int)

# Put together options and goal
X,y = df.drop('Play', axis=1), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Scale numerical options
scaler = StandardScaler()
X_train[['Temperature', 'Humidity']] = scaler.fit_transform(X_train[['Temperature', 'Humidity']])
X_test[['Temperature', 'Humidity']] = scaler.remodel(X_test[['Temperature', 'Humidity']])

# Prepare the mannequin
clf = DummyClassifier(technique='stratified', random_state=42)
# clf = KNeighborsClassifier(n_neighbors=3)
# clf = BernoulliNB()
# clf = DecisionTreeClassifier(random_state=42, max_depth=3)
# clf = LogisticRegression(random_state=42)
# clf = SVC(kernel='rbf', chance=True, random_state=42)
# clf = MLPClassifier(hidden_layer_sizes=(4,2), random_state=42)

# Match and predict
clf.match(X_train, y_train)
y_pred = clf.predict(X_test)
y_prob = clf.predict_proba(X_test)

# Create outcomes dataframe
results_df = pd.DataFrame({
'True Label': y_test,
'Prediction': y_pred,
'Chance of Play': y_prob[:, 1]
})

print("nPrediction Outcomes:")
print(results_df)

# Print accuracy
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")