🧮 tf-idf
=== a statistical measure that evaluates how relevant a 🈁 Word is to a document in a collection of documents==
Formula
====
- : term
- : document
- : log transformed term frequency:
- → greater when term is frequent in a document
- : inverse document frequency
- = collection size, = # num of docs with word i
- → greater when the term is rare in the collection
To compare 2 documents instead of 2 words: