🔮 Word Prediction

How?

  • Determine the 🎲 Probability of each word: MLE
  • Use a [[Language Model#N-Grams 2|📙 Language Model#N-Grams 2]] to obtain the most likely follow-up word

Problems

  • Sparse data:
    • Most events of encountering long word sequences hardly ever occur
  • Zeroes:
    • Some input words are not in the training set → MLE estimates P(w) = 0
      • Smoothing: assigning small non-zero probabilities to P(w) = 0
      • Back-off: use lower order n-grams when higher ones aren’t available
  • Underflow:
    • Multiplying many small numbers can result in an underflow → loss of data
      • → Do all calculations in log space