Learning What’s in a Name with Graphical Models

An overview of the structure and statistical assumptions behind linear-chain graphical models – effective and interpretable approaches to sequence labeling problems such as named entity recognition.

Which of these words refer to a named entity?

Starting prediction server…

I. Probabilistic Graphical Models

Factorizing Joint Distributions

p(a,b)=p(a)p(ba)p(a, b) = p(a) \cdot p(b|a)
p(a,b,c,d)=p(a)p(ba)p(c)p(da,b,c)p(a, b, c, d) = p(a) \cdot p(b|a) \cdot p(c) \cdot p(d|a, b, c)

Directed Acyclic Graphs

Sampled Population

II. Hidden Markov Models

The Hidden Layer

p(xix1,x2,,xi1)=p(xixi1)for all iS={2,,N}p(x_i | x_1, x_2,…, x_{i-1}) = p(x_i | x_{i-1}) \\ \footnotesize \textrm{for all $i \in S = \{2,…, N\}$}
p(xixi1)=p(xi+1xi)for all iS={2,,N1}p(x_i | x_{i-1}) = p(x_{i+1} | x_i) \\ \footnotesize \textrm{for all $i \in S = \{2,…, N-1\}$}

The Observed Layer

p(yix1,x2,,xN)=p(yixi)for all iS={1,2,,N}p(y_i | x_1, x_2,…, x_N) = p(y_i | x_i) \\ \footnotesize \textrm{for all $i \in S = \{1, 2,…, N\}$}

Representing Named Entities

Training

Inference