๐Ÿ”ฎ Word Prediction



  • Sparse data:
    • Most events of encountering long word sequences hardly ever occur
  • Zeroes:
    • Some input words are not in the training set โ†’ MLE estimates P(w) = 0
      • โ†’ Smoothing: assigning small non-zero probabilities to P(w) = 0
      • โ†’ Back-off: use lower order n-grams when higher ones arenโ€™t available
  • Underflow:
    • Multiplying many small numbers can result in an underflow โ†’ loss of data
      • โ†’ Do all calculations in log space