Disability Attitudes

Distribution

The data comes from the Harvard Project Implicit and includes a sample of scores on the Disability Implicit Association Test (IAT) for 124265 participants. I should give a decent starting point for some power calculations. I focus on this outcome because it is the most risky. For other outcomes (games in experiments), effect sizes are generally quite large.

First, I plot the histograms for the IAT score of participants below 18. The red line indicates the mean. A good chunk of the distribution lies above the severe bias threshold.

df %<>%
  mutate(
    iat_all = D_biep.Abled_Good_all,
    age = year - birthyear
    ) %>%
  filter(!is.na(age))

# Cutoff points to classify the degree of bias
thresholds <- tibble(points = c(-0.15, 0.15, 0.33))
text <- tibble(
  points = c(-0.5, -0.15, 0.15, 0.5),
  text = c("Pro bias", "No bias", "Some bias", "Severe bias")
  )

mean_u18 <- df %>% filter(age <= 18) %$% mean(iat_all, na.rm = T)

df %>% filter(age <= 18) %>%
  ggplot(aes(iat_all)) + 
  geom_histogram(alpha = 0.2, position="identity") +
  geom_vline(data = thresholds, aes(xintercept = points)) +
  geom_text(
    data = text,
    mapping = aes(x = points + 0.1, label = text, y = 0), 
    angle = 90, hjust = 0) + 
  geom_vline(
    xintercept = mean_u18, linetype="dotted", 
    color = "red", size=1.5) +
  scale_x_continuous(breaks = seq.int(-1.95, 1.95, 0.3))

The mean and standard deviation of the IAT score by age group (below and above 18) are reported in the table bellow.

df_sum <- df %>% mutate(`Age under 18` = ifelse(age <= 18, 1, 0)) %>%
  group_by(`Age under 18`) %>%
  summarise(
    Mean = mean(iat_all, na.rm = T),
    SD = sd(iat_all, na.rm = T)
  )

# Mean and SD for the under 18 groups
mean <- df_sum[['Mean']][[2]]
sd <- df_sum[['SD']][[2]]

 
df_sum %>% mutate_all(format, digits = 3,nsmall = 0) %>%
  kable(format = "html", 
        caption = "Implicit Association Score Summary statistics by group") %>%
  kable_styling()

Implicit Association Score Summary statistics by group
Age under 18	Mean	SD
0	0.538	0.458
1	0.421	0.468

The bias increases slightly with age. Would it be more interesting to focus on older children (higher bias), or younger? For short term effects, I guess older would be preferable. The variance instead appears to be decreasing with age.

df %>% filter(age < 50) %>% 
  ggplot(aes(age, iat_all)) +
  geom_point(alpha = 0.01) +
  geom_smooth()

Power Calculations

The table below reports the minimum detectable effect (MDE), and the MDE size (MDES) for sample sizes from 100 to 200 (per treatment group), for two-tail hypothesis test with significance level 5% and 80% power.

n <- seq(100, 200, 25)

# MDES for different sample sizes
MDES <- lapply(n, pwr.t.test, sig.level = 0.05, power = 0.8, type = c("two.sample")) %>%
  sapply(., `[[`, "d")

# MDE 
MDE <- MDES * sd

# Share of the mean
sh_mean <- MDE/mean

# Create table
tibble(n, MDE, MDES, sh_mean) %>% 
  rename(`N per group` = n, `Share of mean` = sh_mean) %>%
  mutate_all(format, digits = 3,nsmall = 0) %>% 
  kable() %>% kable_styling()

N per group	MDE	MDES	Share of mean
100	0.186	0.398	0.443
125	0.167	0.356	0.395
150	0.152	0.325	0.361
175	0.141	0.300	0.334
200	0.131	0.281	0.312

One-tail test

# MDES for different sample sizes
MDES <- lapply(n, pwr.t.test, sig.level = 0.05, power = 0.8, type = c("two.sample"), alternative = "less") %>%
  sapply(., `[[`, "d")

# MDE 
MDE <- MDES * sd

# Share of the mean
sh_mean <- MDE/mean

# Create table
tibble(n, MDE, MDES, sh_mean) %>% 
  rename(`N per group` = n, `Share of mean` = sh_mean) %>%
  mutate_all(format, digits = 3,nsmall = 0) %>% 
  kable() %>% kable_styling()

N per group	MDE	MDES	Share of mean
100	-0.165	-0.353	-0.392
125	-0.148	-0.315	-0.351
150	-0.135	-0.288	-0.320
175	-0.125	-0.266	-0.296
200	-0.117	-0.249	-0.277