🔮 Word Prediction §
How? §
- Determine the 🎲 Probability of each word: MLE
- Use a [[Language Model#N-Grams 2|📙 Language Model#N-Grams 2]] to obtain the most likely follow-up word
Problems §
- Sparse data:
- Most events of encountering long word sequences hardly ever occur
- Zeroes:
- Some input words are not in the training set → MLE estimates P(w) = 0
- → Smoothing: assigning small non-zero probabilities to P(w) = 0
- → Back-off: use lower order n-grams when higher ones aren’t available
- Underflow:
- Multiplying many small numbers can result in an underflow → loss of data
- → Do all calculations in log space