- Connecting multiple independent variables
- How to build statistical (regression) models that allow us to predict things?
- What is a regression primer?
- How to fit a model?
- How to assess the model fit?
- What can you do with the model?
- What can you NOT do with the model?
- Linear regression
- Logistic regression
Building a regression model
- Which model?
- linear
- Logistic model
- Categorical variables
Linar model
- fit a straight line through the data
- b0 + b1*score + e
- b0 = c
- b1 = m (slope)
- beta(e.g. IQ) + error = time taken
- b0 + b1*score + e
Categorical data
- fit a mean to each category
Linear regression conditions
How to select features?
- step() in R
- add feature to list
- cross validation โ remove if back
- list of features = best hypothesis
- manual
- plot features
Whatโs the problem of overfitting?
How to choose the right model?
- enough data โ Hold-out cross validation
- train on 70% of data
- validate with other 30%
- retrain with 100% of data
- little data โ k-fold cross validation
- train on k-1 pieces
- validate with remaining one
- repeat k times
- take average of all rounds
boolean values to predict โ logistic model
whatโs an ANOVA test
- t test for multiple groups
- compare means of multiple groups
- = special case of linear regression
what are the conditions for ANOVA?