Natural language engineering 1

⚠️ Important

  • Assessment: in person exam (22.02.2022)
  • Textbook: Jurafsky & Martin 3rd edition

Lecture

📑 VL01 NLE Overview

📑 VL02 Basic NLE Pipeline

  1. What is ⚙ Natural Language Processing?
  2. What is 🧐 Linguistic analysis?
  3. Which 🪜 Linguistic analysis levels exist?
  4. How does the 🚪 GATE NLP pipeline work?

📑 VL03

→ public holiday

📑 VL04 RegEx & FSE

  1. What are 🔡 Regular Expressions?
  2. What’s a 📕 Formal Language and how does it relate to 📖 Formal grammar
  3. How to classify formal grammar?
  4. What are 🎰 Automata and which classes are there?
  5. How to define 🏁 Finite state automata and which types are there?
  6. 1️⃣ Deterministic finite automata Vs. 🔢 Non-deterministic finite automata
  7. How to implement RegEx using a FSA?

📑 VL05 Preprocessing

  1. What is 🧽 Preprocessing (NLP) and why is it important?
  2. 🦾 State machines in detail

📑 VL06 Word Prediction

  1. What’s 🎲 Probability?
  2. How do 🧮 Frequentist Probability and 👨‍🦱 Subjective Probability differ from each other?
  3. What’s the difference between 🧍‍♀️ Prior Probability and 👫 Conditional Probability?
  4. What is a 🪙 Trial and what does it consist of?
  5. What does the ⛓ Chain Rule (Probability) state?
  6. What does the 📜 Bayes Theorem state?
  7. How to do 🔮 Word Prediction?
  8. What’s the 🍔 Maximum likelihood estimation?
  9. What is a 📙 Language Model?
  10. What are some problems during word prediction?
  11. What does the 💭 Markov Assumption (Language) state?

📑 VL07 Text classification

  1. What’s 🏷 Classifier?
  2. How does 🤦‍♂️ Naive Bayes work?
  3. 🎯 Accuracy Vs. 🏹 Precision Vs. 🛒 Recall
  4. What’s the ⚖ Balanced F measure?

📑 VL08 POS Tagging

  1. Which categories of POS tags are there? → 🧩 Parts of Speech
  2. Which POS-tagging methods are there?
  3. How to do ✍ Hand coded POS-Tagging?
  4. What’s the 🕶 Brill tagger algorithm
  5. What is a ⏩ Markov model?
  6. How can the 🥷 Hidden Markov model be used for 🏷 POS-Tagging?

📑 VL09 Logistic Regression

  1. What are some 🏷 ML Classifiers?
  2. Which types of 🚶 Logistic Regression are there?
  3. How does 1️⃣ Binary logistic regression work?
  4. What’s the Cross-entropy loss function?
  5. What’s the Stochastic gradient descent?
  6. How to calculate the Z-Score?
  7. What’s the Logistic Sigmoid Function?

📑 VL10 Text embeddings 1

  1. What is a 🆖 Lemma and ❓ Lexical semantics?
  2. What’s a 〰️ Word embedding?
  3. 🦒 Sparse vector VS. 🐀 Dense vector
  4. What’s 🧮 tf-idf? (WTF)
  5. What’s the Pointwise Mutual Information and PPMI?
  6. How to calculate the 📏 Vector length?
  7. How to calculate the ⚫️ Dot-product and 📐 Cosine similarity?

📑 VL11 Word2Vec

  1. How does 📠 word2vec work?

📑 VL 12 Formal grammars

  1. What is 🧑 BERT?
  2. 👀 Contextual embedding Vs. 🤷‍♂️ Non-Contextual embedding
  3. What’s the 👔 Chomsky normal form?
  4. What’s a 🏗 Formal generative grammar?
  5. What are some [[Natural language#Phenomena|🗣 Natural language#Phenomena]]?

📑 VL 13 Syntax & semantic analysis

  1. What’s 🌳 Parsing?
  2. What’s 🎲 Probabilistic parsing?
  3. What’s the 1️⃣ Predicate Calculus?
  4. What are the units of a formal grammar?
  5. Define 🟩 Syntax
  6. What’s ➕ Syntax-driven semantic analysis?
  7. What are the time complexities of different 🎰 Automata?

📑 VL 14 NLE Applications

  1. [[Natural Language Engineering#🧩 Applications|🗣 Natural Language Engineering#🧩 Applications]]

📑 VL 15 Revision

  1. 🚪 GATE NLP pipeline
  2. 🔡 Regular Expressions, 🎰 Automata
  3. 1️⃣ Type 1 error, 2️⃣ Type 2 error

ℹ️ Course topics

  • Motivation
  • Regular expressions
  • Basic statistical natural language processing
  • Part-of-speech tagging
  • Text classification
  • Lexical semantics (embeddings)
  • Context-free grammars
  • Parsing principles + Complexity
  • Applications: E, IR, QA,

📑 Extra resources