Processing of the Natural Language for Information retrieval

Recovery and access to the information

                                                                   


Hidden models of Markov

    The hidden models of Markov were developed by A. Markov in 1913 to modelizar sequences of words in Russian and at the present time they are used like statistical tool of general intention. The etiquetación becomes serious as a process doubly random parametrizable (the parameters can be considered of precise form in the training) in which the model of the language is represented by a probabilista finite robot.

Una típica cadena de Markov
The communication model this represented by the probability of emission of a word in a given state (the probability of the word depends single on the label), the General description of the system modeliza like a finite set of states, in which passed a time interval, the system changes of state according to probabilities associated to the transitions between states.

Two types of models:
 Visible models:
- Each state has associate a ´unico observable process.
- The exit of the state is not random.
Hidden models:
- In each state there are several types of observations with different probabilities.
- Doubly random Model:
a) trasiciones between states
b) associate observations.
- One of the processes is not observable directly

Example:

We have a series of ballot boxes in which there are balls of different colors. We do not know whichever balls each color has in each ballot box.
P (color 1) = b11
Motto 1
...
Motto N
P (color M) = b1M
...
P (color 1) = bN1
P (color M) = bNM
...

Ejemplo de los modelos de Markov para desambigüación gramatical
Ballot boxes = States
Color = Observation

We want to know as it is the more probable sequence of ballot boxes given a sequence of colors.
 In order to modelizar labels in PLN:
  • States = Etiquetas (Ballot boxes)
  • Observations = Words (Colors)
  • Sequence of Observations = Phrases of the text
  • Moments of time = Positions within the phrase

A same word (color) in different labels is possible (ballot boxes), which gives rise to ambiguities. He himself color (word) can appear more of once in each ballot box, (it labels) giving rise to different probabilities from emission of words
in each label.


Date completes update: 05 of April of 2.007

Free Web Hosting