Key Terms
General form
Y-hat = B0 + B1*x1 + B2*x2 + ... + Bk*xk
Standard format
1 = condition present, 0 = condition absent.
Example
Cond_new = 1 if item is new, 0 if used.
Where
N = number of observations k = number of predictor variables
Use adjusted R-squared when
The goal is maximizing prediction accuracy (common in machine learning applications).
Use p-value approach when
The goal is identifying statistically significant predictors or building the simplest defensible model.
Same formula as simple regression
B_i +/- t*(df) * SE(b_i)
Examples of binary outcomes
Spam or not spam, pass or fail, defaulted or did not default, disease present or absent.
Positive coefficient
That predictor is positively associated with the probability of the outcome = 1 category. Increases in that variable rai
Negative coefficient
That predictor is negatively associated with the probability of the outcome = 1 category. Increases in that variable low
Same approaches apply
Backward elimination and forward selection using p-values (or other criteria). In the spam example, backward elimination
Calculated the same way as in multiple regression
E_i = Y_i - p-hat_i