class: center, middle, inverse, title-slide # SR-5101 Advanced Research Skills ## (3/3) Linear models ### Dr. Haziq Jamil ### Semester 1, 2020/21 --- ### Linear associations Relationship between high school graduate rate in all 50 US states and DC, and % of residents who live below the poverty line. <img src="lecture3_files/figure-html/unnamed-chunk-1-1.png" width="800px" /> -- Key terms: Response, explanatory, relationship, correlation. --- layout: true ## Quantifying the relationship --- - Correlation describes the strength of the linear association between two variables. - It takes values between -1 (perfect negative) and 1 (perfect positive). - A value of 0 indicates no linear association. - Usually denoted by symbol `\(\rho\)` (or `\(r\)`) $$ \rho = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} $$ ```r cor(povdat$poverty, povdat$hs_grad) ``` ``` ## [1] -0.8463959 ``` --- .center[] --- layout: false ## Eyeballing the line Which line passes "the best" through all data points? <img src="lecture3_files/figure-html/unnamed-chunk-3-1.png" width="800px" /> --- ## Equation of a line The equation of a line in two dimensions is $$ y = \beta_0 + \beta_1x $$ It is "parameterised" by - an intercept `\(\beta_0\)`; and - a slope `\(\beta_1\)`. -- In the previous example, the `\(y\)` variable (response) is the '% in poverty', whereas the `\(x\)` variable (explanatory) is the '% high school graduates'. -- Obviously, the line itself does not pass through all the data points (because of imperfections). These imperfections are known as errors. So perhaps the "best line" to fit is the line which gives as small an error as possible. --- layout: true ## Least squares regression --- Define the "residuals" or "errors", as the difference between the value of the data points, and the fitted line: $$ \epsilon_i = y_i - (\beta_0 + \beta_1 x_i) $$ -- Then, the method of least squares aims to find values `\(\beta_0\)` and `\(\beta_1\)` which minimises `\begin{gather} \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \end{gather}` --- GOAL: "Adjust" the blue line (regression slope and intercept) such that the red lines (errors) are as small as possible" <img src="lecture3_files/figure-html/unnamed-chunk-4-1.png" width="800px" /> --- ```r # povdat <- readr::read_csv("poverty.csv") # load data set first povdat ``` ``` ## # A tibble: 51 x 6 ## state poverty hs_grad home_own median_income party_maj ## <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Alabama 19.9 76.8 73.4 36963. Rep ## 2 Alaska 12.3 86.6 63.2 59351. Rep ## 3 Arizona 19.0 81.8 69.7 42418. Rep ## 4 Arkansas 20.0 78.9 71.2 34983. Rep ## 5 California 14.2 82.5 63.3 55266. Dem ## 6 Colorado 12.9 88.1 72.1 50136. Dem ## 7 Connecticut 8.31 89.1 71.7 68935. Dem ## 8 Delaware 11.5 86.2 74.7 55568. Dem ## 9 District of … 18.5 86.5 43.5 58526 Dem ## 10 Florida 16.0 82.2 74.6 44269. Rep ## # … with 41 more rows ``` --- ```r mod <- lm(formula = poverty ~ hs_grad, data = povdat) summary(mod) ``` ``` ## ## Call: ## lm(formula = poverty ~ hs_grad, data = povdat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.741 -1.290 -0.139 1.107 5.359 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 76.85835 5.62054 13.68 < 2e-16 *** ## hs_grad -0.73662 0.06621 -11.12 5.19e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.068 on 49 degrees of freedom ## Multiple R-squared: 0.7164, Adjusted R-squared: 0.7106 ## F-statistic: 123.8 on 1 and 49 DF, p-value: 5.195e-15 ``` --- ```r fitted(mod) ``` ``` ## 1 2 3 4 5 6 ## 20.299078 13.072082 16.588049 18.765499 16.117627 11.933297 ## 7 8 9 10 11 12 ## 11.225450 13.386204 13.140664 16.288343 20.011157 12.462973 ## 13 14 15 16 17 18 ## 13.713219 13.255490 14.276020 11.299856 12.194282 21.069158 ## 19 20 21 22 23 24 ## 20.045332 11.312924 12.704830 10.825570 12.410256 11.299959 ## 25 26 27 28 29 30 ## 20.908419 16.099957 11.696361 10.803279 14.405918 10.540393 ## 31 32 33 34 35 36 ## 12.456659 16.723319 12.874530 17.720971 13.517313 13.607748 ## 37 38 39 40 41 42 ## 15.915906 12.340612 13.031820 12.374578 18.630089 13.338212 ## 43 44 45 46 47 48 ## 19.823752 19.894083 10.758077 10.804524 17.180533 12.523036 ## 49 50 51 ## 18.106826 11.506798 9.716979 ``` --- ```r qplot(x = povdat$hs_grad, y = fitted(mod), geom = "line") + geom_point(data = povdat, aes(x = hs_grad, y = poverty)) ``` <img src="lecture3_files/figure-html/unnamed-chunk-8-1.png" width="800px" /> --- layout:false class: inverse, middle, center ## The technical stuff --- ### Key assumptions in linear regression model The (simple) linear regression model is `\begin{gather} y_i = \beta_0 + \beta_1x_i + \epsilon_i \\ \epsilon_i \sim N(0,\sigma^2) \end{gather}` in words: "each observed value can be explained by a regression line, plus a random noise" -- In addition to this, several assumptions are made: - The explanatory variables are fixed (non-random). - The errors are independent and identically distributed (normally distributed). - Different values of the response variables have the same variance in their errors (homoscedasticity). https://towardsdatascience.com/assumptions-of-linear-regression-algorithm-ed9ea32224e1 --- ## The Modelling Process .center[] - **MODEL**: Which explanatory variable(s) should I include? What relationship do I hypothesise between them? - **FIT**: Run this model in R (or other software packages) - **DIAGNOSE**: Any assumptions violated? Any variables not significant? Important to have a solid model for inference. **Remember: Garbage In Garbage Out**. --- layout: true ## Checking assumptions --- 1) Look at error patterns by plotting it against fitted values, response, explanatory variables, etc. .center[] *"Different values of the response variables have the same variance in their errors (homoscedasticity)"* --- ```r plot(mod, which = 1) ``` <img src="lecture3_files/figure-html/unnamed-chunk-9-1.png" width="800px" /> --- ```r ggplot(mod, aes(.fitted, .resid)) + geom_point() ``` <img src="lecture3_files/figure-html/unnamed-chunk-10-1.png" width="800px" /> --- ```r ggplot(mod, aes(.fitted, .resid)) + geom_point() + geom_smooth(se = FALSE) ``` <img src="lecture3_files/figure-html/unnamed-chunk-11-1.png" width="800px" /> --- 2) Check normality assumptions—histogram and Q-Q plot of residuals ```r ggplot(mod, aes(.resid)) + geom_histogram(binwidth = 1) ``` <img src="lecture3_files/figure-html/unnamed-chunk-12-1.png" width="800px" /> --- ```r ggplot(mod, aes(sample = .resid)) + geom_qq() + geom_qq_line() ``` <img src="lecture3_files/figure-html/unnamed-chunk-13-1.png" width="800px" /> --- ```r plot(mod, which = 2) ``` <img src="lecture3_files/figure-html/unnamed-chunk-14-1.png" width="800px" /> --- layout: true # Tests of significance --- For the estimated values of `\(\beta_0\)` and `\(\beta_1\)`, how do we know the values obtained are "significant"? I.e., how do we know the actual regression line is not: - `\(y = \beta_0\)` only (slope is zero)? - `\(y = \beta_1x\)` only (intercept is zero)? - `\(y = 0\)` (both slope and intercept are zero)? -- Luckily, each estimate `\(\hat\beta_k\)` of `\(\beta_k\)` has a (asymptotically) normal distribution: $$ \hat\beta_k \sim N(\beta_k, SE(\hat\beta_k)) $$ So it is possible to perform statistical tests based on this distribution. --- *Example: Testing the significance of the slope parameter `\(\beta_1\)`.* `\begin{align*} H_0:& \beta_1 = 0 \\ H_1:& \beta_1 \neq 0 \end{align*}` -- The `\(p\)`-value is calculated as `\(P(|Z|> \hat\beta_k/SE(\hat\beta_k))\)`, where `\(Z\sim N(0,1)\)`. Reject `\(H_0\)` for small values of `\(p\)` (usually less than `\(\alpha=0.05\)`). -- ```r summary(mod) ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 76.85835 5.62054 13.68 < 2e-16 *** ## hs_grad -0.73662 0.06621 -11.12 5.19e-15 *** ## --- ## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ``` -- Note: it is actually a `\(t\)`-test, but for large samples doesn't matter too much as `\(t_{n-1}\to N(0,1)\)`. --- layout:false # Coefficient of determination The `\(R^2\)` value, calculated as `\begin{align} R^2 = 1 - \frac{\text{Residual S.S.}}{\text{Total S.S.}} = 1- \frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{\sum_{i=1}^n ( y_i - \bar y)^2} \in [0,1] \end{align}` It is the proportion of the variation in the data that is explained by the model. Values closer to 1 indicate "a better model agreement". -- Note that the "Adjusted `\(R^2\)`" may also be reported or used, but this is virtually the same thing. -- ```r summary(mod) ## Residual standard error: 2.068 on 49 degrees of freedom ## Multiple R-squared: 0.7164, Adjusted R-squared: 0.7106 ## F-statistic: 123.8 on 1 and 49 DF, p-value: 5.195e-15 ``` --- layout: true # Binary variables --- Consider again the `mpg` data set. Any difference in consumption of high vs low displacement engines? ```r mpg ``` ``` ## # A tibble: 234 x 11 ## manufacturer model displ year cyl trans drv cty hwy ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> ## 1 audi a4 1.8 1999 4 auto… f 18 29 ## 2 audi a4 1.8 1999 4 manu… f 21 29 ## 3 audi a4 2 2008 4 manu… f 20 31 ## 4 audi a4 2 2008 4 auto… f 21 30 ## 5 audi a4 2.8 1999 6 auto… f 16 26 ## 6 audi a4 2.8 1999 6 manu… f 18 26 ## 7 audi a4 3.1 2008 6 auto… f 18 27 ## 8 audi a4 q… 1.8 1999 4 manu… 4 18 26 ## 9 audi a4 q… 1.8 1999 4 auto… 4 16 25 ## 10 audi a4 q… 2 2008 4 manu… 4 20 28 ## # … with 224 more rows, and 2 more variables: fl <chr>, ## # class <chr> ``` --- Create a new binary variable to distinguish between vehicles that are low displacement ( `\(\leq 2\)` cc engines) and high displacement ( `\(>2\)` cc engines). ```r (mpg$cc <- as.numeric(mpg$displ > 2)) ``` ``` ## [1] 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [31] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [61] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [91] 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 ## [121] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [151] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [181] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 ## [211] 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 ``` The variable `cc` equals 0 when it is low displacement, and 1 when it is high displacement. --- Think about the relationship between city mileage (`cty`) and this new variable `cc` from a linear regression standpoint. `\begin{align*} y_i = \beta_0 + \beta_1x_i + \epsilon_i \end{align*}` Technically `\begin{align} y_i = \begin{cases} \beta_0 + \epsilon_i &\text{if } x_i = 0 \\ \beta_0 + \beta_1 + \epsilon_i &\text{if } x_i = 1 \end{cases} \end{align}` So we are using a regression framework to estimate two kinds of means (one for low displacement `\(\beta_0\)` and one for high displacement `\(\beta_0+\beta_1\)`). --- ```r summary(lm(cty ~ cc, mpg)) ``` ``` ## ## Call: ## lm(formula = cty ~ cc, data = mpg) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.623 -2.623 -0.623 2.377 12.651 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 22.3488 0.5137 43.51 <2e-16 *** ## cc -6.7258 0.5686 -11.83 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.369 on 232 degrees of freedom ## Multiple R-squared: 0.3762, Adjusted R-squared: 0.3735 ## F-statistic: 139.9 on 1 and 232 DF, p-value: < 2.2e-16 ``` --- The regression fitted is `\begin{align} \hat y_i = \begin{cases} 22.3 &\text{if } x_i = 0 \\ 22.3 -6.72 =15.58 &\text{if } x_i = 1 \end{cases} \end{align}` -- In other words, the average city mileage is 22.3 for low displacement cars, and 15.6 for high displacement cars. Makes sense—larger engines are less efficient! -- Note that the `\(p\)`-value for `\(\beta_1\)` is `\(<0.05\)`, so is is significant. Meaning that there is a significant difference between the two kinds of means. --- Compare this to a `\(t\)`-test ```r t.test(cty ~ cc, mpg, var.equal = TRUE) ``` ``` ## ## Two Sample t-test ## ## data: cty by cc ## t = 11.829, df = 232, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 5.605519 7.846082 ## sample estimates: ## mean in group 0 mean in group 1 ## 22.34884 15.62304 ``` -- This gives the exact same value for the test statistic ($T=11.83$), `\(p\)`-value and means that are estimated! --- layout:false class: inverse, middle, center # Multiple regression --- ## Adding more explanatory variables Back to the poverty data set again. ```r povdat ``` ``` ## # A tibble: 51 x 6 ## state poverty hs_grad home_own median_income party_maj ## <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Alabama 19.9 76.8 73.4 36963. Rep ## 2 Alaska 12.3 86.6 63.2 59351. Rep ## 3 Arizona 19.0 81.8 69.7 42418. Rep ## 4 Arkansas 20.0 78.9 71.2 34983. Rep ## 5 California 14.2 82.5 63.3 55266. Dem ## 6 Colorado 12.9 88.1 72.1 50136. Dem ## 7 Connecticut 8.31 89.1 71.7 68935. Dem ## 8 Delaware 11.5 86.2 74.7 55568. Dem ## 9 District of … 18.5 86.5 43.5 58526 Dem ## 10 Florida 16.0 82.2 74.6 44269. Rep ## # … with 41 more rows ``` We can build a model to explain poverty on several explanatory variables. --- ```r mod1 <- lm(poverty ~ hs_grad + home_own + median_income + party_maj, povdat) summary(mod1) ``` ``` ## ## Call: ## lm(formula = poverty ~ hs_grad + home_own + median_income + party_maj, ## data = povdat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.8794 -0.8104 -0.0762 0.8412 3.2827 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.607e+01 4.582e+00 16.602 < 2e-16 *** ## hs_grad -4.534e-01 5.599e-02 -8.098 2.12e-10 *** ## home_own -1.461e-01 3.538e-02 -4.130 0.000151 *** ## median_income -2.610e-04 3.427e-05 -7.616 1.10e-09 *** ## party_majRep -5.110e-01 5.135e-01 -0.995 0.324857 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.354 on 46 degrees of freedom ## Multiple R-squared: 0.8859, Adjusted R-squared: 0.876 ## F-statistic: 89.28 on 4 and 46 DF, p-value: < 2.2e-16 ``` --- ## Model selection Notice that the coefficient associated with `party_maj` is not significant ( `\(p=0.324\)`). This suggests that the State's party affiliation does not explain the level of poverty very much. -- More precisely, everything remaining constant, the average level of poverty is practically the same regardless of state party affiliation. -- Let's remove this variable and refit the model. --- ```r mod2 <- lm(poverty ~ hs_grad + home_own + median_income, povdat) summary(mod2) ``` ``` ## ## Call: ## lm(formula = poverty ~ hs_grad + home_own + median_income, data = povdat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.8675 -0.8648 -0.0188 0.7602 3.2727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.509e+01 4.473e+00 16.787 < 2e-16 *** ## hs_grad -4.509e-01 5.593e-02 -8.062 2.06e-10 *** ## home_own -1.506e-01 3.510e-02 -4.291 8.82e-05 *** ## median_income -2.444e-04 2.991e-05 -8.170 1.42e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.354 on 47 degrees of freedom ## Multiple R-squared: 0.8834, Adjusted R-squared: 0.876 ## F-statistic: 118.7 on 3 and 47 DF, p-value: < 2.2e-16 ``` Now all variables are significant. --- layout: true ## Check model diagnostics --- ```r plot(mod2, which = 1) ``` <img src="lecture3_files/figure-html/unnamed-chunk-24-1.png" width="800px" /> --- ```r plot(mod2, which = 2) ``` <img src="lecture3_files/figure-html/unnamed-chunk-25-1.png" width="800px" /> --- The diagnostic plots look OK, but it seems there might be an outlier. The model does not seem to be explaining data point 12 very well. ```r apply(povdat[, -c(1, 6)], 2, mean) # mean values ``` ``` ## poverty hs_grad home_own median_income ## 14.41173 84.77447 71.70830 47664.89805 ``` ```r povdat[12, ] ``` ``` ## # A tibble: 1 x 6 ## state poverty hs_grad home_own median_income party_maj ## <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Hawaii 9 87.4 49.5 58683 Dem ``` -- Two options: 1) EXPLAIN or 2) REMOVE. Further reading: https://towardsdatascience.com/ways-to-detect-and-remove-the-outliers-404d16608dba --- layout: true ## Back to model selection --- Here's a comparison of `mod`, `mod1` and `mod2` <table class="mtable" style="border-collapse: collapse; border-style: none; margin: 2ex auto;"> <tr style="border-style: none;"><td colspan="1" style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; border-top: 1px solid;"></td><td colspan="3" style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; border-top: 1px solid;">mod</td><td colspan="3" style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; border-top: 1px solid;">mod1</td><td colspan="3" style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; border-top: 1px solid;">mod2</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; border-top: 1px solid;">(Intercept)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-top: 1px solid;">76</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-top: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-top: 1px solid;">858<span class="signif.symbol">***</span></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-top: 1px solid;">76</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-top: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-top: 1px solid;">073<span class="signif.symbol">***</span></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-top: 1px solid;">75</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-top: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-top: 1px solid;">086<span class="signif.symbol">***</span></td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(5</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">621)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(4</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">582)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(4</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">473)</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;">hs_grad</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">737<span class="signif.symbol">***</span></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">453<span class="signif.symbol">***</span></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">451<span class="signif.symbol">***</span></td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">066)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">056)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">056)</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;">home_own</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">146<span class="signif.symbol">***</span></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">151<span class="signif.symbol">***</span></td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">035)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">035)</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;">median_income</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">000<span class="signif.symbol">***</span></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">000<span class="signif.symbol">***</span></td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">000)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">000)</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;">party_maj: Rep/Dem</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">−0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">511</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">(0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">513)</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;"></td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;"></td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; border-top: 1px solid;">sigma</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-top: 1px solid;">2</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-top: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-top: 1px solid;">068</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-top: 1px solid;">1</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-top: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-top: 1px solid;">354</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-top: 1px solid;">1</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-top: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-top: 1px solid;">354</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;">R-squared</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">716</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">886</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">0</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">883</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left;">AIC</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">222</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">797</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">182</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">363</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em;">181</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em;">450</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.3em; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; border-bottom: 1px solid;">BIC</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-bottom: 1px solid;">228</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-bottom: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-bottom: 1px solid;">592</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-bottom: 1px solid;">193</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-bottom: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-bottom: 1px solid;">954</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: right; margin-right: 0px; padding-right: 0px; padding-left: 0.3em; border-bottom: 1px solid;">191</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: center; margin-left: 0px; margin-right: 0px; padding-right: 0px; padding-left: 0px; width: 1px; border-bottom: 1px solid;">.</td><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px; text-align: left; margin-left: 0px; padding-left: 0px; padding-right: 0.3em; border-bottom: 1px solid;">109</td></tr> <tr style="border-style: none;"><td style="padding-top: 1px; padding-bottom: 1px; padding-left: 0.5ex; padding-right: 0.5ex; margin-top: 0px; margin-bottom: 0px; border-style: none; border-width: 0px;" colspan="10"><p style="width: inherit;">Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05</p> </td></tr> </table> --- There are several criteria for selecting the "best" model - All explanatory variables significant - Residuals as small as possible (i.e. errors/sigma) - `\(R^2\)` as high as possible - Akaike's information criterion (AIC) small as possible - Bayesian information criterion (BIC) small as possible -- Model comparison (or variable selection) is such a vast topic to cover, many methods exist. Simplest one for now is to perform **Stepwise Regression**. In a nutshell, this automatically adds and removes variables until it reaches a model with minimal AIC. --- ```r step(mod1) ``` ``` ## Start: AIC=35.63 ## poverty ~ hs_grad + home_own + median_income + party_maj ## ## Df Sum of Sq RSS AIC ## - party_maj 1 1.815 86.117 34.718 ## <none> 84.302 35.631 ## - home_own 1 31.260 115.562 49.717 ## - median_income 1 106.292 190.594 75.234 ## - hs_grad 1 120.183 204.485 78.822 ## ## Step: AIC=34.72 ## poverty ~ hs_grad + home_own + median_income ## ## Df Sum of Sq RSS AIC ## <none> 86.117 34.718 ## - home_own 1 33.73 119.846 49.574 ## - hs_grad 1 119.10 205.218 77.005 ## - median_income 1 122.31 208.422 77.795 ``` ``` ## ## Call: ## lm(formula = poverty ~ hs_grad + home_own + median_income, data = povdat) ## ## Coefficients: ## (Intercept) hs_grad home_own median_income ## 75.0862390 -0.4509464 -0.1505851 -0.0002444 ``` --- layout: true ## Interpreting the model --- Two main things to describe: 1) Which variables contribute towards explaining the response variable, and 2) what is the signa and magnitude of the effect? --- ### The intercept For this regression model: ```html poverty = intercept + hs_grad + home_own + median_income + error ``` Suppose we have a state that has `hs_grad=0`, `home_own=0` and `median_income=0`. Then the regression becomes simply `poverty = intercept + error`. -- In particular, ```html E(poverty) = intercept ``` -- .center[**The intercept is the expected value of the response variable given that all other explanatory variables are zero.**] --- ### The slopes Consider the regression function `\(y = \beta_0 + \beta_1x_1 + \cdots + \beta_px_p + \epsilon\)`. Now suppose the value of the explanatory variable `\(x_1\)` increases by one unit, then we have `$$y' = \beta_0 + \beta_1(x_1+1) + \cdots + \beta_px_p + \epsilon$$` -- The difference between the two values is `\begin{align} y' - y &= \beta_0 + \beta_1(x_1+1) + \cdots + \beta_px_p + \epsilon \\ &\hspace{2em} - (\beta_0 + \beta_1x_1 + \cdots + \beta_px_p + \epsilon) \\ &= \beta_1 \end{align}` -- **The `\(\beta_k\)` values give the average increase in the response variable given a unit change in `\(x_k\)`** --- ```r round(coef(mod2), 3) ``` ``` ## (Intercept) hs_grad home_own median_income ## 75.086 -0.451 -0.151 0.000 ``` Example statements: - The expected poverty rate in a state with zero high school graduates, home ownership and income is 75%—not helpful!. - The average values for `hs_grad` and `home_own` are 84.7 and 71.7 respectively. The expected poverty rate in a state with average high school graduate and home ownership (ignore income) is therefore $$ 75.1 -0.451 \times 84.7 -71.7 \times 0.151 = 26.1 $$ - For every percentage increase in high school graduates, poverty rates decline by 0.451 points. - Forevery percentage increase in home ownership, poverty rates decline by 0.151 points. --- layout: true ## Scale --- What's wrong with the coeficient for income? It's 0.000... is it actually zero? No. Look at the scale of the variables: ```r summary(povdat) ``` ``` ## state poverty hs_grad ## Length:51 Min. : 8.312 Min. :75.74 ## Class :character 1st Qu.:11.596 1st Qu.:81.73 ## Mode :character Median :13.637 Median :86.34 ## Mean :14.412 Mean :84.77 ## 3rd Qu.:17.407 3rd Qu.:87.96 ## Max. :24.389 Max. :91.15 ## home_own median_income party_maj ## Min. :43.50 Min. :33633 Length:51 ## 1st Qu.:70.94 1st Qu.:40923 Class :character ## Median :73.25 Median :46040 Mode :character ## Mean :71.71 Mean :47665 ## 3rd Qu.:74.63 3rd Qu.:52701 ## Max. :79.56 Max. :70630 ``` Everything else in a percentage scale, while income is measured annually so goes to the tens of thousands. --- One way to make things better is to scale the variables appropriately. For example, we can report income in thousands. Either we fix this at the data frame level ```r povdat$median_income <- povdat$median_income / 1000 mod3 <- lm(poverty ~ hs_grad + home_own + median_income, povdat) ``` or fit a new model directly ```r mod3 <- lm(poverty ~ hs_grad + home_own + I(median_income/1000), povdat) ``` -- Note that scaling **does not** affect significance of coefficients. --- ```r summary(mod3) ``` ``` ## ## Call: ## lm(formula = poverty ~ hs_grad + home_own + I(median_income/1000), ## data = povdat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.8675 -0.8648 -0.0188 0.7602 3.2727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 75.08624 4.47299 16.787 < 2e-16 *** ## hs_grad -0.45095 0.05593 -8.062 2.06e-10 *** ## home_own -0.15059 0.03510 -4.291 8.82e-05 *** ## I(median_income/1000) -0.24436 0.02991 -8.170 1.42e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.354 on 47 degrees of freedom ## Multiple R-squared: 0.8834, Adjusted R-squared: 0.876 ## F-statistic: 118.7 on 3 and 47 DF, p-value: < 2.2e-16 ``` --- ```r coef(mod3) ``` ``` ## (Intercept) hs_grad ## 75.0862390 -0.4509464 ## home_own I(median_income/1000) ## -0.1505851 -0.2443638 ``` "For every one thousand increase in median income, poverty rate declines by 0.244 points" --- layout: true ## ANOVA revisited --- ```r data("PlantGrowth") summary(PlantGrowth) ``` ``` ## weight group ## Min. :3.590 ctrl:10 ## 1st Qu.:4.550 trt1:10 ## Median :5.155 trt2:10 ## Mean :5.073 ## 3rd Qu.:5.530 ## Max. :6.310 ``` Study of 30 plants' growth in height. 10 subjected to Treatment A, 10 in Treatment B, and 10 control. Are the means the same in each group? --- Clearly an ANOVA problem. `\begin{align} H_0:& \ \mu_{ctrl} = \mu_A = \mu_B \\ H_1:& \text{ means are not equal} \end{align}` -- ```r aggregate(weight ~ group, PlantGrowth, mean) ``` ``` ## group weight ## 1 ctrl 5.032 ## 2 trt1 4.661 ## 3 trt2 5.526 ``` -- ```r summary(aov(weight ~ group, PlantGrowth)) ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## group 2 3.766 1.8832 4.846 0.0159 * ## Residuals 27 10.492 0.3886 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- A different way of thinking about the problem: Define "dummy" variables `\(x_1\)` and `\(x_2\)`, whereby `\begin{align*} x_1 = \begin{cases} 1 &\text{if treatment 1} \\ 0 &\text{otherwise} \end{cases}` \end{align*}` `\begin{align*} x_2 = \begin{cases} 1 &\text{if treatment 2} \\ 0 &\text{otherwise} \end{cases}` \end{align*}` Now build the regression model `\begin{align*} y = \beta_0 + \beta_1 x_{1} + \beta_2 x_{2} + \epsilon \end{align*}` --- In other words, we have that .center[] The model breaks down the means of the three groups in terms of three parameters `\(\beta_0\)`, `\(\beta_1\)` and `\(\beta_2\)`. -- - `\(\beta_0\)` represents the control group mean - `\(\beta_1\)` represents the *additional* effect of being in treatment group A - `\(\beta_2\)` represents the *additional* effect of being in treatment group B --- ```r summary(lm(weight ~ group, PlantGrowth)) ``` ``` ## ## Call: ## lm(formula = weight ~ group, data = PlantGrowth) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0710 -0.4180 -0.0060 0.2627 1.3690 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.0320 0.1971 25.527 <2e-16 *** ## grouptrt1 -0.3710 0.2788 -1.331 0.1944 ## grouptrt2 0.4940 0.2788 1.772 0.0877 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6234 on 27 degrees of freedom ## Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096 ## F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591 ``` --- Interpretation - `\(\hat\beta_0 = 5.03\)` is the control group mean. - `\(\beta_1\)` not deemed significant, so effect of treatment 1 not found to be significant (no difference from control). - `\(\beta_2\)` weakly significant ( `\(p=0.09\)`). Treatment 2 found to increase the yield of plants by 0.494 grams. Note that the `\(F\)` statistic is identical to the ANOVA test statistic. It actually tests whether or not all `\(\beta_k\)` values are equal to zero or not. --- layout:false class: inverse, middle, center # Logistic regression --- layout: true ### Normality assumption --- Revisit the titanic data set ```r (titanic <- readr::read_csv("titanic.csv")) ``` ``` ## # A tibble: 891 x 12 ## PassengerId Survived Pclass Name Sex Age SibSp Parch ## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> ## 1 1 0 3 Brau… male 22 1 0 ## 2 2 1 1 Cumi… fema… 38 1 0 ## 3 3 1 3 Heik… fema… 26 0 0 ## 4 4 1 1 Futr… fema… 35 1 0 ## 5 5 0 3 Alle… male 35 0 0 ## 6 6 0 3 Mora… male NA 0 0 ## 7 7 0 1 McCa… male 54 0 0 ## 8 8 0 3 Pals… male 2 3 1 ## 9 9 1 3 John… fema… 27 0 2 ## 10 10 1 2 Nass… fema… 14 1 0 ## # … with 881 more rows, and 4 more variables: Ticket <chr>, ## # Fare <dbl>, Cabin <chr>, Embarked <chr> ``` Let's build a model to predict survival of passengers by age. --- Plotting the best fit line... does this make sense? ```r ggplot(titanic, aes(x = Age, y = Survived)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_bw() ``` <img src="lecture3_files/figure-html/unnamed-chunk-42-1.png" width="800px" /> --- Besides the weird looking graph, the **normality assumption** is violated as well. -- PROBLEM 1: Since `\(y_i\)` can only take two values (survived vs not-survived), it is in fact a Bernoulli distribution and not normal. -- PROBLEM 2: You might get predicted values that are greater than 1 and less than 0. -- Therefore the normal linear model is inadequate for this purpose. --- layout: true ## Logistic regression --- Let `\begin{align*} y_i = \begin{cases} 1 &\text{if survived} \\ 0 &\text{otherwise} \end{cases} \end{align*}` and let `\(\pi_i=P(y_i=1) = P(\text{survived})\)`. -- Model instead the log odds by a linear function `\begin{align*} \log \left( \frac{\pi_i}{1-\pi_i} \right) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip} \end{align*}` This is called the logistic regression. --- ```r mod <- glm(Survived ~ Age, titanic, family = "binomial") summary(mod) ``` ``` ## ## Call: ## glm(formula = Survived ~ Age, family = "binomial", data = titanic) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.1488 -1.0361 -0.9544 1.3159 1.5908 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.05672 0.17358 -0.327 0.7438 ## Age -0.01096 0.00533 -2.057 0.0397 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 964.52 on 713 degrees of freedom ## Residual deviance: 960.23 on 712 degrees of freedom ## (177 observations deleted due to missingness) ## AIC: 964.23 ## ## Number of Fisher Scoring iterations: 4 ``` --- layout: true ## Interpretation --- Suppose all values of `\(x_{ik} =0\)`, we are left with the intercept. And this means that `\begin{align*} \log \left( \frac{\pi_i}{1-\pi_i} \right) &= \beta_0 \\ \Rightarrow \frac{\pi_i}{1-\pi_i} &= e^{\beta_0} \end{align*}` -- The term on the left is called the **ODDS**— it expresses how likely an event is to occur rather than not occur. -- In this example, the odds of surviving is given as `\(e^{-0.056}=0.94\)`—More likely to not survive than not survive (assume age is zero). The odds of surviving for someone who is 30 years of age (mean age) is therefore `\(e^{-0.056-30\times0.011}=0.67\)` (it's worse!) --- <!-- Note that the logistic regression can be expressed like this: --> <!-- \begin{align*} --> <!-- \pi_i = \frac{e^{\beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}}{1 + e^{\beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}} --> <!-- \end{align*} --> <!-- -- --> For one unit increase in `\(x_1\)`, the odds ratio is `\begin{align*} OR(x_1+1,x_1) &= \frac{e^{\beta_0 + \beta_1 (x_{i1}+1) + \cdots + \beta_p x_{ip}}}{e^{\beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}} \\ &= e^{\beta_1} \end{align*}` -- **This suggests that the ratio of the odds of a "successful" event is multipled by `\(e^{\beta_1}\)` for every unit increase of `\(x_1\)`.** -- For a passenger on the titanic, every year older decreases his odds of survival by `\(e^{-0.01}=0.99\)`. --- ```r ggplot(titanic, aes(x = Age, y = Survived)) + geom_point() + geom_smooth(method = "glm", se = FALSE, method.args = list(family = "binomial")) + theme_bw() ``` <img src="lecture3_files/figure-html/unnamed-chunk-44-1.png" width="800px" /> --- layout: false # Other notes - For binary response variables, it's more appropriate to use logistic regression model (although you might find in practice that a normal linear model is used, especially for prediction purposes). - Logistic regression is part of a larger family of "generalised linear models" which include Poisson count models, multinomial response models, etc. - It's of course possible to add more explanatory variables, and also categorical/dummy variables. - Unfortunately, diagnostics that we saw in normal linear case do not typically apply. No easy way to diagnose model. At the very least, it's possible to see the pattern of residuals, but this is not always reliable. --- layout:false class: inverse, middle, center # END