Presentation
Versión en español

TEXTERM 2: Fundamentals, estrategies and tools for automatic processing and extraction of specialised information

Extraction, retrieval and information management from textual corpora requires a great variety of natural language processing tools. In the recent past, theoretical and applied research has been carried out by different research groups, and the output is a certain amount of available resources built to fit these requirements. However, a step further is required in order to provide a better access to the scientific and technical information contained in textual corpora. Current linguistic research makes possible to design and develop a series of tools capable of performing a smart and fine processing of scientific and technical information at a higher level.

This project aims to continue the research that has been carried out from 1994 by the IULATERM group in the field of natural language processing in scientific and technical texts. For achieving the current results, the group has received financial support from the Plan Nacional, the autonomous Government Plan and the European Commission.

The research goals of this project can be divided into two groups: A) theoretical and applied research and B) technological oriented research. More precisely, in the theoretical field of information extraction, the main goals are a) to analyse the different types of Specialized Knowledge Units and b) the further analysis of the relationships between those units, mapped from the same texts where the units are extracted. In the technological field, the main goals are: a) the enrichment of processing dictionaries with non- inflective morphological information, and also syntactic (argumental structure) and semantic (semantic features for arguments) information; b) the development of a chunker devoted to the inner structure of lexicon (morphological chunker); c) the development of a second-level syntactic chunker and d) the improvement of an automatic system of knowledge representation from texts (conceptual mapping).