# mahout-commits mailing list archives

##### Site index · List index
Message view
Top
From conflue...@apache.org
Subject [CONF] Apache Lucene Mahout > Hidden Markov Models
Date Sat, 12 Jun 2010 15:59:00 GMT
```Space: Apache Lucene Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Hidden Markov Models (https://cwiki.apache.org/confluence/display/MAHOUT/Hidden+Markov+Models)

Edited by Max Heimel:
---------------------------------------------------------------------
h3. Introduction and Usage

Hidden Markov Models are used in multiple areas of Machine Learning, such as speech recognition,
handwritten letter recognition or natural language processing.

h3. Formal Definition

A Hidden Markov Model (HMM) is a statistical model of a process consisting of two (in our
case discrete) random variables O and Y, which change their state sequentially. The variable
Y with states \{y_1, ... , y_n\} is called the "hidden variable", since its state is not directly
observable. The state of Y changes sequentially with a so called - in our case first-order
- Markov Property. This means, that the state change probability of Y only depends on its
current state and does not change in time. Formally we write: P(Y(t+1)=y_i|Y(0)...Y(t)) =
P(Y(t+1)=y_i|Y(t)) = P(Y(2)=y_i|Y(1)). The variable O with states \{o_1, ... , o_m\} is called
the "observable variable", since its state can be directly observed. O does not have a Markov
Property, but its state probability depends statically on the current state of Y.

Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states,
m is the number of observable states, P is an n-dimensional vector containing initial hidden
state probabilities, A is the nxn-dimensional "transition matrix" containing the transition
probabilities such that A\[i,j\]=P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional "emission
matrix" containing the observation probabilities such that B\[i,j\]= P(O=o_i|Y=y_j).

h3. Problems

Rabiner [1] defined three main problems for HMM models:

1. Evaluation: Given a sequence O of observations and a model M, what is the probability
P(O|M) that sequence O was generated by model M. The Evaluation problem can be efficiently
solved using the Forward algorithm
2. Decoding: Given a sequence O of observations and a model M, what is the most likely
sequence Y*=argmax(Y) P(O|M,Y) of hidden variables to generate this sequence. The Decoding
problem can be efficiently sovled using the Viterbi algorithm.
3. Learning: Given a sequence O of observations, what is the most likely model M*=argmax(M)P(O|M)
to generate this sequence. The Learning problem can be efficiently solved using the Baum-Welch
algorithm.

h3. Resources

[1] Lawrence R. Rabiner (February 1989). "A tutorial on Hidden Markov Models and selected
applications in speech recognition". Proceedings of the IEEE 77 (2): 257-286. doi:10.1109/5.18626.