Space: Apache Lucene Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Hidden Markov Models (https://cwiki.apache.org/confluence/display/MAHOUT/Hidden+Markov+Models)
Edited by Max Heimel:

h3. Introduction and Usage
Hidden Markov Models are used in multiple areas of Machine Learning, such as speech recognition,
handwritten letter recognition or natural language processing.
h3. Formal Definition
A Hidden Markov Model (HMM) is a statistical model of a process consisting of two (in our
case discrete) random variables O and Y, which change their state sequentially. The variable
Y with states \{y_1, ... , y_n\} is called the "hidden variable", since its state is not directly
observable. The state of Y changes sequentially with a so called  in our case firstorder
 Markov Property. This means, that the state change probability of Y only depends on its
current state and does not change in time. Formally we write: P(Y(t+1)=y_iY(0)...Y(t)) =
P(Y(t+1)=y_iY(t)) = P(Y(2)=y_iY(1)). The variable O with states \{o_1, ... , o_m\} is called
the "observable variable", since its state can be directly observed. O does not have a Markov
Property, but its state probability depends statically on the current state of Y.
Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states,
m is the number of observable states, P is an ndimensional vector containing initial hidden
state probabilities, A is the nxndimensional "transition matrix" containing the transition
probabilities such that A\[i,j\]=P(Y(t)=y_iY(t1)=y_j) and B is the mxndimensional "emission
matrix" containing the observation probabilities such that B\[i,j\]= P(O=o_iY=y_j).
h3. Problems
Rabiner [1] defined three main problems for HMM models:
1. Evaluation: Given a sequence O of observations and a model M, what is the probability
P(OM) that sequence O was generated by model M. The Evaluation problem can be efficiently
solved using the Forward algorithm
2. Decoding: Given a sequence O of observations and a model M, what is the most likely
sequence Y*=argmax(Y) P(OM,Y) of hidden variables to generate this sequence. The Decoding
problem can be efficiently sovled using the Viterbi algorithm.
3. Learning: Given a sequence O of observations, what is the most likely model M*=argmax(M)P(OM)
to generate this sequence. The Learning problem can be efficiently solved using the BaumWelch
algorithm.
h3. Resources
[1] Lawrence R. Rabiner (February 1989). "A tutorial on Hidden Markov Models and selected
applications in speech recognition". Proceedings of the IEEE 77 (2): 257286. doi:10.1109/5.18626.
Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action
