🔪 Tokenization
= the process of splitting up text into an array of tokens (e.g. words/symbols)
How?
- 🔡 Regular Expressions
- e.g.
s/\s+/\n/
ors/(\w[\w\d]+)\./\1\. /
- e.g.
📖 Example:
- “I like cookies” → “I” “like” “cookies”
Search
Jul 26, 2025, 1 min read
= the process of splitting up text into an array of tokens (e.g. words/symbols)
s/\s+/\n/
or s/(\w[\w\d]+)\./\1\. /