Statistical Evaluation on Scoring Bias | by Alexander Barriga | Oct, 2024

Studying the column names from left to proper that symbolize the decide’s names between Jimena Hoffner and Noelia Barsel you’ll see that:

  • 1st-Fifth and Eleventh-Fifteenth judges belong to what we are going to denote as panel 1.
  • The Sixth-Tenth judges and Sixteenth-Twentieth judges belong to what we are going to denote as panel 2.

Discover something? Discover how dancers that have been judged by panel 2 present up in a lot bigger proportion and dancers that have been decide by panel 1. In the event you scroll by means of the PDF of this information desk you’ll see that this proportional distinction holds up all through the rivals that scored effectively sufficient to advance to the semi-final spherical.

Word: The dancers shaded in GREEN superior to the semi-final spherical. Whereas dancers NOT shaded in Inexperienced didn’t advance to the semi-final spherical.

So this begs the query, is that this proportional distinction actual or is it on account of random sampling, random task of dancers to at least one panel over the opposite? Nicely, there’s a statistical take a look at we will use to reply this query.

Two-Tailed Take a look at for Equality between Two Inhabitants Proportions

We’re going to use the two-tailed z-test to check if there’s a important distinction between the 2 proportions in both path. We’re serious about whether or not one proportion is considerably completely different from the opposite, no matter whether or not it’s bigger or smaller.

Statistical Take a look at Assumptions

  1. Random Sampling: The samples have to be independently and randomly drawn from their respective populations.
  2. Giant Pattern Measurement: The pattern sizes have to be giant sufficient for the sampling distribution of the distinction in pattern proportions to be roughly regular. This approximation comes from the Central Restrict Theorem.
  3. Anticipated Variety of Successes and Failures: To make sure the conventional approximation holds, the variety of anticipated successes and failures in every group ought to be a minimum of 5.

Our dataset mets all these assumptions.

Conduct the Take a look at

  1. Outline our Hypotheses

Null Speculation: The proportions from every distribution are the identical.

Alt. Speculation: The proportions from every distribution are the NOT the identical.

2. Decide a Statistical Significance degree

The default worth for alpha is 0.05 (5%). We don’t have a purpose to chill out this worth (i.e. 10%) or to make it extra stringent (i.e. 1%). So we’ll use the default worth. Alpha represents our tolerance for falsely rejecting the Null Hyp. in favor of the Alt. Hyp on account of random sampling (i.e. Kind 1 Error).

Subsequent, we supply out the take a look at utilizing the Python code supplied beneath.

def plot_two_tailed_test(z_value):
# Generate a spread of x values
x = np.linspace(-4, 4, 1000)
# Get the usual regular distribution values for these x values
y = stats.norm.pdf(x)

# Create the plot
plt.determine(figsize=(10, 6))
plt.plot(x, y, label='Normal Regular Distribution', colour='black')

# Shade the areas in each tails with pink
plt.fill_between(x, y, the place=(x >= z_value), colour='pink', alpha=0.5, label='Proper Tail Space')
plt.fill_between(x, y, the place=(x <= -z_value), colour='pink', alpha=0.5, label='Left Tail Space')

# Outline vital values for alpha = 0.05
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha / 2)

# Add vertical dashed blue strains for vital values
plt.axvline(critical_value, colour='blue', linestyle='dashed', linewidth=1, label=f'Crucial Worth: {critical_value:.2f}')
plt.axvline(-critical_value, colour='blue', linestyle='dashed', linewidth=1, label=f'Crucial Worth: {-critical_value:.2f}')

# Mark the z-value
plt.axvline(z_value, colour='pink', linestyle='dashed', linewidth=1, label=f'Z-Worth: {z_value:.2f}')

# Add labels and title
plt.title('Two-Tailed Z-Take a look at Visualization')
plt.xlabel('Z-Rating')
plt.ylabel('Chance Density')
plt.legend()
plt.grid(True)

# Present plot
plt.savefig(f'../photographs/p-value_location_in_z_dist_z_test_proportionality.png')
plt.present()

def two_proportion_z_test(successes1, total1, successes2, total2):
"""
Carry out a two-proportion z-test to examine if two inhabitants proportions are considerably completely different.

Parameters:
- successes1: Variety of successes within the first pattern
- total1: Complete variety of observations within the first pattern
- successes2: Variety of successes within the second pattern
- total2: Complete variety of observations within the second pattern

Returns:
- z_value: The z-statistic
- p_value: The p-value of the take a look at
"""
# Calculate pattern proportions
p1 = successes1 / total1
p2 = successes2 / total2

# Mixed proportion
p_combined = (successes1 + successes2) / (total1 + total2)

# Normal error
se = np.sqrt(p_combined * (1 - p_combined) * (1/total1 + 1/total2))

# Z-value
z_value = (p1 - p2) / se

# P-value for two-tailed take a look at
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_value)))

return z_value, p_value

min_score_for_semi_finals = 7.040
is_semi_finalist = df.PROMEDIO >= min_score_for_semi_finals

# Variety of {couples} scored by panel 1 advancing to semi-finals
successes_1 = df[is_semi_finalist][panel_1].dropna(axis=0).form[0]
# Variety of {couples} scored by panel 2 advancing to semi-finals
successes_2 = df[is_semi_finalist][panel_2].dropna(axis=0).form[0]

# Complete variety of {couples} that the place scored by panel 1
n1 = df[panel_1].dropna(axis=0).form[0]
# Complete pattern of {couples} that the place scored by panel 2
n2 = df[panel_2].dropna(axis=0).form[0]

# Carry out the take a look at
z_value, p_value = two_proportion_z_test(successes_1, n1, successes_2, n2)

# Print the outcomes
print(f"Z-Worth: {z_value:.4f}")
print(f"P-Worth: {p_value:.4f}")

# Verify significance at alpha = 0.05
alpha = 0.05
if p_value < alpha:
print("The distinction between the 2 proportions is statistically important.")
else:
print("The distinction between the 2 proportions is just not statistically important.")

# Generate the plot
# P-Worth: 0.0000
plot_two_tailed_test(z_value)

The Z-value is the statistical level worth we calculated. Discover that it exists far out of the usual regular distribution.

The plot reveals that the Z-value calculated exists far outdoors the vary of z-values that we’d count on to see if the null speculation is true. Thus leading to a p-value of 0.0 indicating that we should reject the null speculation in favor of the choice.

Which means the variations in proportions is actual and never on account of random sampling.

  • 17% of dance coupes judged by panel 1 superior to the semi-finals
  • 42% of dance {couples} judged by panel 2 superior to the semi-finals

Our first statistical take a look at for bias has supplied proof that there’s a optimistic bias in scores for dancers judged by panel 2, representing an almost 2x enhance.

Subsequent we dive into the scoring distributions of every particular person decide and see how their particular person biases have an effect on their panel’s total bias.