Natural Language Processing
This is a note for the course of POLIMI Natural Language Processing. Natural Language Processing: Words/Syntax/Meaning/Discourse.
Linguistic Topics
Lexical and Compositional Semantics
Representing Meanings: Linking and Roles
每种语言都有一定的subj-verb-complements结构,比如“Someone wanting + want + something wanted”。
而为了represent meanings,我们需要语言能够表达如下的内容:
Meaning of Utterances[表达方式的含义]:Semantic Networks/Logics/Frame-based representations/…
Meaning of Words[每个word的含义]:Lexical databases…
Semantic Analysis
- Syntax-driven semantics analysis[Domain Independent]: Input: Parse Tree/Feature structures/Lexical Dependency Diagrams 对于输入,首先进行syntactic analysis,从这一步得到syntactic structures(也就是方法中的Input),将syntactic sturctures输入semantic analysis,得到最终的meaning representations。
- Semantic grammar[Domain Dependent]: Semantic grammars could solve the problem of mismatching in Syntax driven semantics analysis. However the reuse of these semantic grammar is really hard.
- Information extraction: Detect the relationships and events, and then FILL THE TEMPLATE.
Lexical Semantics
LEXEME is the smallest unit of written form/spoken form/sense. LEXICON is a collection of LEMMAs.
WordNet: Containing 150 000 terms(noun/verb/adj/…). 每个term都通过一系列的synset表示(比如对于beautiful这个term,它和gorgeous属于同一个synset),synsets之间又通过一些列的predefined relations来链接(比如beautiful所在的synset和ugly所在的synset通过一个反义词的relation链接)。
Interal Structure of Words:
Themantic roles (roles associated with verb meaning)
Selectional restrictions (constraints decided by the words contraining the subj/obj): 比如动词【吃】,显然【桌子】不能作为吃这个动作的宾语。
Word Sense Disambiguiation: Supervised Machine Learning/Naive Bayes
Word Level Processing
Morphology
English Morphology(英语的构词法研究)【stems+affixes】:可以分为两类,inflectional-规律的,derivational-不规律的。
Finite State Transducers: 将一个语义的组合变成一个实际上书写正确的组合(cat+N+PL转化成cats)。
Multi-Taped Machines: 处理比较复杂的情况,比如fox变复数需要加-es而不是-s。相当于是用多步骤进行翻译。
Error Correction/Prediction/N-grams
Bayes model for ERROR CORRECTION: 对单个单词进行correction
N-grams: 根据输入的多个单词进行prediction
Markov Assumption
Shannon’s Method: 通过训练得到的model生成随机的句子,使其和trainset使用的句子有着类似的形式。
Evaluation: Intrinsic/External
Syntactic Processing
POS Tagging: the process of assigning a part of speech or lexical class marker to each word in a collection
Closed/Open classes: whether new tag of POS could be created.
POS:
prepositions(on/over/…)
determiners(a/an/the/…)
pronouns(she/who/I/…)
… …
POS Tagging steps:
- Choosing sa standard Tagset to work with (a coarse tagset? Penn TreeBank tagset?…)
- Use the tagset to assign each word one tag.
THE MOST IMPORTANT PROBLEM IN POS TAGGING - [words often have more than one POS]: 2 method - RULE BASED TAGGING(write each rule by hand)/STOCHASTIC(HMM/MEMMs)
HMM[a special case of Bayesian inference]: Given an observation, find the tag sequences.
Formal Grammars
Constituency[Internal structure/ External behavior]: Noun Phrases 可以是apple也可以是the reason he comes to University.
Context Free Grammars[Rules/Terminals/Non-terminals(the constituents in a language)]
Head: the critical stuff.
Dependency Parsing[Better performance, and catch the relationships of syntactic relations]
Parsing with BottomUp/TopDown
Parsing means assign proper trees to input strings. [find all admissible trees and select the correct one]
表现并不好,因为对于一个句子,仅仅根据他的syntactic structure进行parsing可能得到多个正确的parsing tree,我们需要语义上的辅助,同时重复的parsing也带来一些问题。
Shallow Parsing/Probabilistic Parsing/Dependency Parsing
Shallow Parsing(Finite State Parsing): Use subset of CFG without recursion(rules like NP->NPVP is not allowed) / L-shape sliding window.
Pos Tagging output: My/PRP$ dog/NN likes/VBZ his/PRP$ food/NN ./.
Chunking output: [NP My Dog] [VP likes] [NP his food]
Probabilistic CFG: assign probabilities to parse trees. (Different rules, different probabilities)
Lexicalized PCFG: lexical meaning with PCFG.
Dependent Parsing: instead of constituent units, concentrate on the links from word to word.
VOICE:
vowels / consonants / silence
Prosody: 韵律学,研究人们说话的韵律和节奏等。(Intonation语调/Pauses停顿/Rhythm节奏/Stress重音/Duration延长声音)
INFORMATION = PROSODY + LINGUISTIC INFORMATION
Perception: human’s ability to have what you know interact with what you see and hear. [1-Sensing/ 2-Orgnizing/ 3 Identifying & Recognizing]
General Info of Automatic Speech Recognition & Text-To-Speech
Phonetics: study of phones (the units composing words)
Phonology: study of phonemes (an abstraction of a set of phones, the smallest contrastive linguistic unit)
Phonetics研究人能够发出的声音,而Phonology研究他们的声音在不同的语言中的不同的意义。所以在每一种语言的研究中phonology更重要。
Formants: the spectral peaks of the sound frequency spectrum of the voice.
TTS: transforming a text string into a waveform
- Text analysis: text string -> phonetic representation
- Waveform synthesis: phonetic representation -> waveform[这里phonetic represnetation类似于英语中的音标或者汉语中的拼音,只是音节的表示方式]
Text Analysis: Text Normalization -> Phonetics Analysis -> Prosodic Analysis
ASR: Starting from a recorder vocal signal, recognize the corresponding sequence of words
Given a vocal signal, composed of M MFCC vectors(generate from windowing of vocal signal), compute the right sequence of M subphones s
Or we could use HMM for ASR, the input observations are MFCC vectors, while the output are subphone sequences.
Pragmatics[研究语言具体的使用]:take turns in conversations/text orgnization/presupposition]-Discourse Processing/Discourse Structure
Dialog and conversational agents PART 1
ELIZA: search and substitude using memory(some human-designed rules).
A: Men are all alike
ELIZA: [‘all’ detected, REPLY IN CERTAIN WAY] IN WHAT WAY
A: My boyfriend made me came here.
ELIZA: [‘my’ and ‘me’ detected, SUBSTITUTE] YOUR BOYFRIEND MADE YOU CAME HERE.
HUMAN CONVERSATION
- Turn-taking:什么时候speaker知道应该轮到自己讲话了?[An general example:current speaker A指定了一个next speaker B时,B speak next,如果没有指定next speaker,那么任何人都可以take next turn,如果没有人take next turn,那么current speaker A继续take next turn。]
- Speech acts:[each utterance is 3 acts] LOCUTIONARY ACT(语言行为), ILLOCUTIONARY ACT(言外行为), PERLOCUTIONARY ACT(言后行为). [An general example: “CAN I HAVE THE REST OF YOUR SANDWICH?” L-Question, IL-Request, Pre-“You give me sandwich.”, “I WANT THE REST OF YOUR SANDWICH!” L-Declarative, IL-Request, Pre-“You give me sandwich”], ILLOCUTIONARY ACTS可以分成5类,Assertives(swearing, concluding, suggesting, …) /Directives(asking, ordering, …) /Commissives(promising, betting, …) /Expressives(thanking, welcoming, …)/Declarations
- Grounding: both speaker should achieve common ground (things mutually believed by both speaker).但是如何让speaker确认对方和自己有相同的共识呢?Continued attention, Relevant next contribution, Acknowledgement( mm-hmm ), Demonstration, Display. [ -“I need to travel in May.” -“What day in May did you want to travel?” 通过Display来表达已经知道了对方想要在五月出行的意愿]
- Conversational structure:将一段对话分为多个部分,比如一段日常对话的structure可能是Greetings-Daily talking-Saying Bye.
- Implicature:[例子:-“好的,请问您想定那一天的航班?” -“我在19日有一个会议…”] 4-GRICEAN-MAXIMS:在对话推断中的四个原则,原则1-relevant,我们在讨论航班的问题的时候我说我在19日有个会议这件事和我们讨论的航班日期时有关系的;原则2-quantity,假如对方告诉我现在有5个航班,在逻辑的角度出发,如果现在有更多的航班对方也可以这么回答,但是我们假设对方只会回答一个需要的数字,不多也不少;原则3-quality,对方说的事情时正确的,我们假设双方都不会对一个缺乏依据的事情进行评判;原则4-manner,我们假设双方都简明扼要,避免过度重复。
- *Coherence:这个本来不在这里,但是在后面的部分,我认为将这部分放在这里是合适的,这部分表示了在语言中我们常常使用Next,Thanks,OK,Last of all等词语来“润滑”我们的对话,而这在NLP中也是非常重要的。
Speech Recognition->Natural Language Understanding->Dialogue Manager/Task Manager->Natural Language Generation->Text-to-speech Synthesis
Speech Recognition: ASR-Input waveform, output strings of words
Natural Language Understanding: NLU-for speech dialogue systems, FRAME AND SLOT SEMANTIC are wildly used.
Frame and slot semantic
“Show me the morning flights from Boston to LA on Tuesday”
SHOW:
FLIGHTS:
ORIGIN: CITY: Boston DATE: Tuesday TIME: Morning DESTINATION: CITY: LA
Analysis:这种方式我们通过采用一种semantic grammar,然后将这句话解析成上述形式,比如ORIGIN->from CITY, CITY->NY|Boston|LA,这种语法规则往往不是通用的,而是根据某一种特殊的工作需求建立的。
Dialogue Manager: controls the architecture and structure of dialogue. [Finite State/Frame Based/Information State/AI Planning]
System Initiative Dialogue: system controls the dialogue and asks the user a lot of questions. [Easy but limited]
User Initiative Dialogue: user directs the system, asks a single question and system answers.
Mixed Initiative Dialogue: conversational initiative can shift between user and system.[open prompts(“How may I help you?“) vs directive prompts(“Say yes if you accept this call, otherwise please say no.“), restrictive(strong constraints on the AST system, based on the dialogue state) vs non-restrictive grammar]
Natural Language Generation: choose the concepts to express to user. Exprsss concepts in words and assign prosody.
TTS: takes words and prosodic annotations, output waveform
Dialog and conversational agents PART 2
We want the system better than just form filling.
INFORMATION STATE: context/beliefs/user model/task context/…
DIALOGUE ACTS: conversational moves, incorporates ideas of grounding.
DIALOGUE ACTS AMBIGUITY:处理这种混淆是很复杂的,我们需要判断这一段对话属于什么部分,比如对一句话“你可以把她的微信给我嘛?”,表面上这是一个request动作,但是其实我们知道与其回答是或者否,这是一个directive动作,我们应该回答的是“这是她的微信号/她已经有男朋友了”。
CONVERSATIONAL ACT TYPES: turn-taking, grounding, core speech acts, argumentation.
DIALOGUE ACT INTERPRETER:
DIALOGUE ACT GENERATOR: confirmation(explicit/implicit confirmation -“I want to go to Berlin” - “You said you want to go to Berlin?”/“When do you want to go to Berlin?”)/rejection
CORRECTION DETECTION:
SET OF UPDATE RULES: update dialogue state as acts are interpreted, generate dialogue acts.
CONTROL STRUCTURE TO SELECT WHICH UPDATE RULES TO APPLY
对于一个特定的情景:订机票
Dialogue Acts:THANK/GREET/INTRODUCE/REJECT/CLARIFY/…
Dialogue System Evaluation
MAXIMIZE USER SATISFACTION, which means MAXIMIZE TASK SUCCESS(take completed rate, correctness, …) and MINIMIZE COSTS. The COSTS here means EFFICIENCY MEASURES(time cost, turns taken, …) and QUALITY MEASURES (number of rejections, time-out prompts, concept accuracy, …). Use some kinds of questionair could help us know this information.
Dialog and conversational agents PART 2
Modelling a dialogue system as a probabilistic agent
Environment->”What the world is like”->”What it will be like if I do action A”->”How happy will the user be in such state”->”What the action I should do in conclusion”
Markov Assumption comes in here(knowledge in the reinforcement learning)
Anaphora Resolution 代词的解析(如it等)
CONSTAINTS ON COREFERENCE
Number agreement:John has an Acura. It is red.
Person and case agreement:John and Mary have Acuras. We love them (where We=John and Mary)
Gender agreement: John has an Acura. He/it/she is attractive.
Syntactic constraints: John bought himself a new Acura (himself=John)/John bought him a new Acura (him = not John)
SOLUTION
- Collect the potential referents (up to 4 sentences back)
- Remove potential referents that do not agree in number or gender with the pronoun
- Remove potential references that do not pass syntactic coreference constraints
- Compute total salience value of referent from all factors, including, if applicable, role parallelism (+35) or cataphora (-175).
- Select referent with highest salience value. In case of tie, select closest.
Coherence - Discourse Structure
John hid Bills car keys. He was drunk
John hid Bills car keys. He likes spaghetti
What makes a text coherent?
- Appropriate use of coherence relations between subparts of the discourse – rhetorical structure
- Appropriate sequencing of subparts of the discourse – discourse/topic structure
- Appropriate use of referring expressions
Coherence Relations:
- Result: John bought an Acura. His father went quite angry about that
- Explanation: John hid Bills car keys. He was drunk
- Parallel: John bought an Acura. Bill leased a BMW.
- Elaboration: John bought an Acura this weekend. He purchased a beautiful new Integra for 20 thousand dollars at Bills dealership on Saturday afternoon.
- Occasion: Dorothy picked up the oil-can. She oiled the Tin Woodman’s joints
Techniques
Regular Languages
Context Free Grammars
Finate State Methods
Augmented Grammars
First Order Logic
Probability Models
Supervised Machine Learning Methods
Deep Learning
Most current machine learning already works well, based on human designed representations and input features. However, it may be better to let the machine learn good features(representations), which involves deep learning and neural networks.
Single Artificial Neuron->A layer of neurons->NN->Deep NN(with hidden layer)->Softmax output layer
Back propagation
Word2Vec: models that produce word embeddings. (which means input a V-dimension vector and output a Q-dimension vector, Q<<V)
N-gram language modelling
RNN
LSTM
POS with Bidirectional LSTM
RNN for parsing
TTR & ASR: WaveNet
Applications
Spelling Correction
Information Retrieval : Summarization
- Single Document/Mutiple Document
- Generic Summarization/Query-focused Summerization
- Abstract/Extract[Extract is far more easier]
Methods:
- Content Selection: What’s important and what’s not important?
- Information Ordering: How to orgnize the informaiton? In what order so that the information I am going to represent is coherent?
- Sentence Realization: How to made our sentence readable and reasonable?