RICOTERM 2

The combination of different techniques has demonstrated to be a very efficient approach in all information retrieval (IR)-oriented technologies Within the used techniques, together with statistical and machine learning techniques, linguistic-based techniques can be highlighted. Natural language processing (NLP) used in lemmatization, morphological tagging, syntactic analysis and disambiguation, may give good results in IR system that work based on previously delimited data sets (texts, document databases, corpora, and knowledge banks) such as the generation of abstracts, the automatic enrichment of computational dictionaries and the automatic extraction of terminology. However, the extensive linguistic processing in non-restricted-oriented IR systems such as the Web seems to be overwhelming, and thus it is necessary to develop other linguistic strategies (conceptual ontologies, thesaurus for indexing, lexical hierarchies, lists of concordances) that combine with those of a statistical nature allow to improve the efficiency of existing searching engines.

The purpose of this research project is to develop efficient terminology and discourse descriptions in economics in Spanish, Catalan, Galician, Basque, and English with the applied objective of creating multilingual linguistic resources in order to be used in various IR techniques, and specially for the searching engines in the Internet. The research group already has a linguistically processed corpus in three languages (Spanish, Catalan, and English), and will develop in the framework of this project the complementary corpora in Galician and Basque, which used will allow the design of strategies of general use in IR. The use of this corpora will allow the design of other applications, mainly of a semantic and phraseological nature that can be used in IR: the enrichment of processing dictionaries with semantic and phraseological information, the development of an ontology in economics linked to a multilingual terminology database or the adaptation of an automatic terminology extractor in economics. Apart from these resources, that will be used in information extraction techniques, the expected results of this project for RI are based on the design of an automatic system for the re-making of multilingual searches as input for the existing searching engines. This system will use the information of the ontology and the terminology database to make a simple and ambiguous search a complex search that improves the relevance of the retrieval with the domain of economics.