๐ช Tokenization
= the process of splitting up text into an array of tokens (e.g. words/symbols)
How?
- ๐ก Regular Expressions
- e.g.
s/\s+/\n/
ors/(\w[\w\d]+)\./\1\. /
- e.g.
๐ Example:
- โI like cookiesโ โ โIโ โlikeโ โcookiesโ