Developing a Bayesian Approach to Linear Models

Brian Wood

Linear regression models in frequency-land

m1 <- lm(d2$height ~ d2$weight)
summary(m1)

Call:
lm(formula = d2$height ~ d2$weight)

Residuals:
     Min       1Q   Median       3Q      Max 
-19.7464  -2.8835   0.0222   3.1424  14.7744 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 113.87939    1.91107   59.59   <2e-16 ***
d2$weight     0.90503    0.04205   21.52   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.086 on 350 degrees of freedom
Multiple R-squared:  0.5696,    Adjusted R-squared:  0.5684 
F-statistic: 463.3 on 1 and 350 DF,  p-value: < 2.2e-16

Linear regression models in frequency-land

slope <- m1$coefficients[[2]]
intercept <- m1$coefficients[[1]]
plot(height~weight, data=d2, pch=16, cex=.5)
abline(a = intercept, b = slope)

plot of chunk unnamed-chunk-3

Bayesian linear models using rethinking::map

m4.3 <- map(
    alist(
        height ~ dnorm( a + b*weight , sigma ) ,
        a ~ dnorm( 178 , 100 ) ,
        b ~ dnorm( 0 , 10 ) ,
        sigma ~ dunif( 0 , 50 )
    ) ,
    data=d2 )

Bayesian linear models using rethinking::map

m4.3 <- map(
    alist(
        height ~ dnorm( a + b*weight , sigma ) ,
        a ~ dnorm( 156 , 100 ) ,
        b ~ dnorm( 0 , 10 ) ,
        sigma ~ dunif( 0 , 50 )
    ) ,
    data=d2 )

How to build a linear model (see page 77 of Rethinking)

  1. Identify the outcome variable or variables.
  2. For each outcome variable, define a likelihood distribution that defines the plausibility of individual observations. In linear regression, this distribution is always Gaussian.
  3. Identify the predictor variables
  4. Relate the parameters of the likelihood distributions to the predictor variables. This forces us to name and define all the parameters of the model.
  5. Choose priors for all of the parameters in the model.
  6. Summarize the model.
  7. Implement the model.
  8. Plot the model.

Linear model of !Kung height and weight (see page 92)

  1. Outcome variable: height in cm
  2. Likelihood distribution for outcome: gaussian/normal
    • “In linear models, the likelihood distribution for the outcome variable is always Gaussian”.
  3. Predictor variable: weight in kg

Step 4 of The Linear Model Strategy

  • 4. Relation between parameters of likelihood distn. and predictor variables:
    • “The strategy is to make the parameter for the mean of a Gaussian distribution, \( \mu \), into a linear function of the predictor variable and other, new parameters that we invent.”
    • \( \mu_i = \alpha + \beta h_i \)

Step 5: Define Priors

  • \( \alpha \sim Normal(156,100) \)
    • “The value of the intercept is frequently uninterpretable without also studying any \( \beta \) parameters. This is why we need very weak priors for intercepts, in many cases”
  • \( \beta \sim Normal(0,10) \)
  • \( \sigma \sim Uniform(0,50) \)

Step 6: Summarize the Model

  • \( height_i \sim Normal(\mu_i, \sigma) \) [likelihood]
  • \( \mu_i = \alpha + \beta * weight_i \) [linear model]
    • “The mean is no longer a parameter to be estimated. Rather, \( u_i \) is constructed from other parameters, \( \alpha \) and \( \beta \), and the predictor variable weight
  • \( \alpha \sim Normal(156,100) \) [\( \alpha prior \)]
  • \( \beta \sim Normal(0,10) \) [\( \beta prior \)]
  • \( \sigma \sim Uniform(0,50) \) [\( \sigma prior \)]

Step 7: Implement the Model

m4.3 <- map(
    alist(
        height ~ dnorm( mu , sigma ) ,
        mu <- a + b*weight ,
        a ~ dnorm( 156 , 100 ) ,
        b ~ dnorm( 0 , 10 ) ,
        sigma ~ dunif( 0 , 50 )
    ) ,
    data=d2 )

Quick Review of MAP

  • MAP performs quadradic approximation to find the peak/mode of the posterior distribution.
  • This is the 'maximum a posteriori' estimate for the post distr.

Centering Predictor Variables

  • Why do it?
    • helps remove co-variance between predictor variables.
    • helps make intercept parameter more intelligible: \( \alpha \) now means “the expected value of the outcome, when the predictor is at its average value”
 d2$weight.c <- d2$weight - mean(d2$weight)

Step 7: Implement the Model

start_list <- list(a=178, b=0)#start values help map!
m4.4 <- map(
    alist(
        height ~ dnorm( mu , sigma ) ,
        mu <- a + b*weight.c ,
        a ~ dnorm( 178 , 100 ) ,
        b ~ dnorm( 0 , 10 ) ,
        sigma ~ dunif( 0 , 50 )
    ) ,start = start_list,
    data=d2 )

Step 8: Plot the Model

plot( height ~ weight , data=d2 )
abline( a=coef(m4.3)["a"] , b=coef(m4.3)["b"] )

plot of chunk unnamed-chunk-9

Extracting Samples from a MAP model to Predict Height, Given Weight

post <- extract.samples(m4.3, n = 1000)
mu_at_50_kg <- post$a + post$b*50 
HPDI( mu_at_50_kg, prob=0.89 )
   |0.89    0.89| 
158.5364 159.6636 

Hard Problems of Chapter 4

4H1. The weights listed below were recorded in the !Kung census, but heights were not recorded for these individuals. Provide predicted heights and 89% intervals (either HPDI or PI) for each of these individuals. That is, fill in the table below, using model-based predictions.

Individual weight expected height 89% interval
1 46.95
2 43.72
3 64.78
4 32.59
5 54.63

4H1

Strategy – Fit a model of height ~ weight and sample from it

m.4H1 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- alpha + b*weight,
    alpha <- dnorm(156,100),
    b <- dnorm(0,10),
    sigma <- dunif(0,50)
  ), data=d
) 

sv <- data.frame(weight=c(46.95, 43.72, 64.78, 32.59, 54.63))
sd <- sim(fit = m.4H1, data = sv, refresh=0)
sd.m <- round(apply(sd, 2, mean),0)
sd.pi <- round(apply(sd, 2, PI),0)

4H1

Possible Solution

Individual weight expected height 89% interval
1 46.95 156 148, 165
2 43.72 153 145, 161
3 64.78 173 165, 181
4 32.59 143 136, 151
5 54.63 163 155, 172

4H2

  1. Select out all the rows in the Howell data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.

  2. Fit a linear regression to these data, using map. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

  3. Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Super-impose the MAP regression line and 89% HPDI for the mean. Also superimpose the 89% HPDI for predicted heights.

  4. What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don't have write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.

4H2

4H2. Select out all the rows in the Howell data with ages below 18 years > of age. If you do it right, you should end up with a new data frame with 192 > rows in it.

d<-Howell1
d<- d[d$age<18,]
nrow(d)
[1] 192

4H2

Fit a linear regression to these data, using map. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

4H2

Fit a linear regression to these data, using map. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?

mH42 <- map(
  alist(
    height ~ dnorm(mu, sigma),
    mu<- alpha + b*weight,
    alpha <- dnorm(108,25),
    b <- dnorm(0,1),
    sigma <- dunif(0,50)
  ), data=d
)

coef(mH42)
    alpha         b     sigma 
58.616966  2.700698  8.438917 

For every 10 units of increase in weight, the model predicts that children will increase in height by 27 centimeters.

4H2

Plot the raw data, with height on the vertical axis and weight on thehorizontal axis. Super-impose the MAP regression line and 89% HPDI for the mean. Also superimpose the 89% HPDI for predicted heights.

4H2

xvals <- seq(from=min(d$weight), to=max(d$weight), length.out = 30)
samples <- link(mH42, data=data.frame(weight=xvals), refresh=0)
samples.hpdi <- apply(samples, 2, HPDI)
pred.heights <- sim(mH42, data=data.frame(weight=xvals), refresh=0)
pred.heights.hpdi <- apply(pred.heights, 2, HPDI)

plot(height~weight, data=d)
abline(a=coef(mH42)["alpha"], b=coef(mH42)["b"])
mtext("Weight and height among !kung aged less than 18\n")
shade(pred.heights.hpdi, xvals)
shade(samples.hpdi, xvals)