Linear-Chain Probabilistic Graphical Models

Probabilistic graphical models like Conditional Random Fields are powerful frameworks for solving sequence modeling problems.

Curved text linesSo, for instance, The Object of Knowledge grew from the 91 pages of the first edition to the 460 pages of the sixth edition in 1928. This style of writing makes the development of his thought particularly perspicuous.In what follows, the Critique of Pure Reason is cited by the usual A/B method.Vervet monkeys give alarm calls to warn fellow monkeys of the presence of predators.This entry proceeds mostly chronologically, to indicate the historical development of the topic.Current work on Latinx philosophy tends to cluster into several subject matters.In its most important role in bioethics, informed consent is a legitimacy requirement.He also wrote numerous works on other philosophical topics, especially psychology.“I am certain that I can have no knowledge of what is outside me except by means of the ideas I have within me.”In addition to exhibiting sensitive dependence, chaotic systems possess two other properties: they are deterministic and nonlinear.

Natural language is sequential but highly contextual, comprising of multiple interdependent parts. Consider the following sentences:

“I can never sing bass.”

"A light bass provided balance to the song."

"A large, beautiful bass flashed by the boat."

"Bass" means something different in each sentence and sounds different in the third sentence compared to the first two. A human reader would find the meanings to be rather intuitive given the context available in each sentence. For example, the fact that the "bass" flashed by a boat tells us that it can move and is likely underwater. Given the limited set of definitions for "bass", we quickly infer that it must be referring to a type of fish.

This chain of reasoning happens quickly and comes as second nature to human readers. But can we teach a machine to reason in a similar way? Can a language model disambiguate the role and meaning of words that are highly dependent on those around them?

Without resorting to complex neural networks, there are compact probabilistic models provide reliable and interpretable predictions in many sequence modeling problems, like part-of-speech tagging and named entity recognition.

In this article, we’ll explore three such models, each building on top of the one before: Hidden Markov Models (HMMs), Maximum-Entropy Markov Models (MEMMs), and Conditional Random Fields (CRFs). All three are probabilistic graphical models, which we’ll cover in the next section.

I. Probabilistic Graphical Models

Probabilistic models can easily ballon in complexity as we add more and more variables to the system. This makes complex models with more than a few variables difficult to interpret, if not inscrutable.

Let’s start with a simple example with just two random variables, AA and BB. Assume that BB is conditionally dependent on AA, so we can write its distribution as p(ba)p(b|a). By the definition of conditional probabilities, we have:

p(ba)=p(a,b)p(a)p(b|a) = \frac{p(a, b)}{p(a)}

Now, we want to model the joint distribution of AA and BBp(a,b)p(a, b). This appears in the equation above, so we just need to rearrange the terms to get:

p(a,b)=p(a)p(ba)p(a, b) = p(a) \cdot p(b|a)

If we know the values of p(a)p(a) and p(ba)p(b|a), we can now derive p(a,b)p(a, b). This is a simple enough example.

Add more variables, however, and the resulting factorization can get messy, fast. Assume we have two more variables: CC and DD. CC is conditionally dependent on AA and BB, and DD on AA and CC. The factorization becomes:

p(a,b,c,d)=p(a)p(ba)p(ca,b)p(da,c)p(a, b, c, d) = p(a) \cdot p(b|a) \cdot p(c|a, b) \cdot p(d|a, c)

The relationship between variables is noticeably more opaque, hidden behind second and third-order dependencies. This will only get worse as we add more variables. What we need is a compact way to represent this large network of conditional dependencies.

Graphs offer a natural solution to this problem. Complex, multivariate distributions can be factorized into compact graphical representations that are easier to understand and interpret.

Let's return to the first example with two variables. The joint probability distribution is, again:

p(a,b)=p(a)p(ba)p(a, b) = p(a) \cdot p(b|a)

In a graph, each term on the right hand side can be represented by a single node. An arrow indicates conditional dependence. To express the conditional dependence in p(ba)p(b|a), we add an arrow pointing from AA to BB:

We can construct a generative story of how A and B are sampled from the graphical representation:

Sampling

Click "Draw Samples" to generate new random samples from A and B.