๐Ÿ”ช Tokenization

= the process of splitting up text into an array of tokens (e.g. words/symbols)

How?

๐Ÿ“– Example:

  • โ€œI like cookiesโ€ โ†’ โ€œIโ€ โ€œlikeโ€ โ€œcookiesโ€