Processing of the Natural Language for Information retrieval

Recovery and access to the information

                                                                   


Natural language processors

As we mentioned Lenguaje Natural previously (LN) is the means that we used of daily way to contact our with the other people

This type of language is the one that allows designating the present things us and to reason to near them, was developed and organized from the human experience and can be used to analyze highly complex situations and to reason very subtly. The wealth of its semantic components gives to the natural languages its great expressive power and its value like a tool for subtle reasoning.  On the other hand the syntax of LN (Lenguaje Natural) can be modeled easily by a language formal, similar to used in the mathematics and the logic. Another property of the natural languages is the polisemantica, that is to say, the possibility that a word in a diverse oration has meaning.

In a first summary, the natural languages are characterized by the following properties:

  1. Developed by progressive enrichment before any attempt of formation of a theory.
  2. The importance of its expressive character which had greatly to the wealth of the semantic component (polisemantica).
  3. Difficulty or impossibility of a complete formalization.

The applications of the Processing of Natural Languages very are varied, since its reach is very great, some of the applications of the PLN are:


  • Automatic translation: one talks about more than nothing to the correct translation from a language to another one, taking into account which is wanted to express in each oration, and not only word by word. An approach to this type of translators is babylon.


  • Recovery of the information: in this application, a clear serious example the following one: Person arrives at computer and she says him (in LN) that she is what looks for, this looks for and says to him that is what has referring to the subject.


  • Extraction of Information and Summaries: The new programs, must have the capacity to create a summary of a document being based on the provided data, making a detailed analysis of the content and not only truncating forward edge of the paragraphs.

  • Cooperative resolution of problems: The computer must have the capacity to cooperate with the humans for the solution of complex problems, providing data and information, including also, the demand of information on the part of the computer to the user, having to exist an excellent interactivity between the user and the computer.

  • Intelligent tutors: The application of the PLN in this aspect, comes by computer, having this being approx in a 99%, when having this the capacity to evaluate to educating and to have the capacity of adapting to each type of student.

  • Voice recognition: This is an application of the PLN that more success has obtained at the present time, since the today computers already have this characteristic, the voice recognition can have two possible uses: in order to identify the user or to process what the user dictates, already existing commercial programs, that are accessible by most of the users, example: ViaVoice.

In order to continue our study of the natural languages, he is necessary the one that we know the levels the language, which will be used for the explanation of the following subject that is the Architecture of a PLN system. The language levels that we will present are the following ones: morphologic, syntactic, semantic, and pragmatic fonológico.

  • Fonológico level: it deals with how the words are related to the sounds that represent.
  • Morphologic level: it deals with how the words are constructed from units of meaning but small calls morfemas.
  • Syntactic level: it deals with how the words can be united to form orations, fixing the structural paper that each word plays in the oration and that sintagmas is part of others sintagmas.
  • Semantic level:  it deals with the meaning of the words and of how the meaning is united to give meaning to an oration, also one talks about the independent meaning of the context, that is to say, of the isolated oration.
  • Pragmatic level: it deals with how the orations are used in different situations and of how the use affects the meaning of the orations. Usually one recognizes a recursivo sublevel: discursivo, that deals with how the meaning of an oration is affected by the immediately previous orations.

Arquitectura de un sistema de PLN

The explanation to the architecture shown for systems PLN is simple:

  1. The user expresses the computer to him that is what wishes to do.
  2. The computer analyzes the provided orations, in the morphologic and syntactic sense, that is to say, if the phrases contain words composed by morfemas and if the structure of the oracioneses correct.
  3. The following step, is to analyze the orations semantically, that is to say, to know as it is the meaning of each oration, and to assign to the meaning of these to expressions logics.
  4. Once made the previous step, now we can make the analysis pragmatic of the instruction, that is to say, once analyzed the orations, now analyze all meetings, taking into account the situation of each oration, analyzing the previous orations, once made east step, the computer or knows that it is what is going to do, that is to say, or has the final expression.
  5. Once obtained the final expression, the following step is the execution of this, to obtain therefore the Turn out and power to provide it to it to the user.

    One of the great problems of the PLN takes place when an expression in natural language has more than an interpretation, that is to say, when in the destiny language two or more different expressions can be assigned to him. This problem of the ambiguity appears in all the levels of the language, without exception. Example:

“Juan saw Maria, with the telescope”

“Juan saw Maria with the telescope”


    In east appearance problem is too simple, but in fact, is one of but complicated and that more complications it has given so that the PLN can be developed completely, since when appearing in all the levels of the language, they must develop programs (in formal language) to solve them in each case.

    Between the inductive techniques applied to solve these tasks of desambiguación the learning based on examples can be found, learning based on transformation rules, grammar inference, and statistical approaches based on models of Maxima entropy or hidden models of Markov (HMM).

    These last ones have been used widely in the field of the automatic recognition of the speech as much for the acoustic modeled one as for the construction of models of the language for the recognition, as much of isolated words, as of the continuous speech. The success in these systems and the availability of resources have allowed its extension to the PLN systems. In order to be able to carry out other tasks of desambiguación in PLN using models of Markov it is necessary to approach each one of these like problems of labeled.

    In addition to the morfosintáctico labeled one, other problems as they are the syntactic analysis super cial or the desambigüación of the sense of the words, also can be reduced to a labeled problem of. For example, in the task of cial super analysis or chunking, the analysis of an oration can imagine by means of labels that they indicate to what sintagma belongs a word.

    In this case, the sequence of observations they can be morfosintácticas labels and the states of the model represent labels of sintagma or chunk. In case of considering an analysis more complex, as it is the case of the detection of clauses, structured labels can be used that mark the level of nesting of the word within the analysis. The semantic desambiguación can see as the allocation of the most probable sequence of semantic labels (or felt) the words of an oration.


Date completes update: 05 of April of 2.007

Free Web Hosting