1. Connecting multiple independent variables
  2. How to build statistical (regression) models that allow us to predict things?
  3. What is a regression primer?
  4. How to fit a model?
  5. How to assess the model fit?
  6. What can you do with the model?
  7. What can you NOT do with the model?
  8. Linear regression
  9. Logistic regression

Building a regression model

  1. Which model?
    • linear
    • Logistic model
    • Categorical variables

Linar model

  1. fit a straight line through the data
    1. b0 + b1*score + e
      1. b0 = c
      2. b1 = m (slope)
    2. beta(e.g. IQ) + error = time taken

Categorical data

  1. fit a mean to each category

Linear regression conditions

How to select features?

  • step() in R
    1. add feature to list
    2. cross validation โ†’ remove if back
    3. list of features = best hypothesis
  • manual
    • plot features

Whatโ€™s the problem of overfitting?

How to choose the right model?

  • enough data โ†’ Hold-out cross validation
    • train on 70% of data
    • validate with other 30%
    • retrain with 100% of data
  • little data โ†’ k-fold cross validation
    • train on k-1 pieces
    • validate with remaining one
    • repeat k times
    • take average of all rounds

boolean values to predict โ†’ logistic model

whatโ€™s an ANOVA test

  • t test for multiple groups
  • compare means of multiple groups
  • = special case of linear regression

what are the conditions for ANOVA?