Català | Español | Euskara | Galego

UPF-UPV Subproject

Own Targets

Working Packs

First Year

PT11.- Development of economic corpus for Basque. Electronic text localization, text selection and sample partition. Text processing. Incorporation to the IULA's Technical Corpus available through BWANANET.

PT12. – Exploitation of the corpus on economics in Spanish and Catalan. Information extraction from the processed corpus of lexical units, frequencies, concordances, etc. using BWANANET. Parallelization with ALINEA of the part of the corpus containing translated texts.

PT13.- Licensing of computational dictionaries and morphological analyzers already existing in Basque. Adaptation of the tools licensed during the working process. Adaptation of the morphological tagging systems.

PT14.- Localization of documental thesauri, ontologies, lexical hierarchies already existing and re-usable with economics information. Evaluation of contents and import possibilities. Localization of electronic glossaries on economics.

PT15.- Design and constitution of resources. Import of dictionaries on economics to the MERCEDES system. Design of the ontology and the terminological database associated using Ontoterm. Design of the project web page. Transfer protocols between database managers.

Second Year

PT21.- Enrichment of lexical resources. Predicate database (verbs, adjectives and nominalizations) of economics with semantic and phraseological information to add into the processing dictionaries of Spanish, Catalan and Basque. Data incorporation, by import or ad hoc, into the multilingual terminological database.

PT22.- Ontology building. Revision of the imports of re-usable ontologies to the base ontology. Addition of concept systems from data results of PT12.

Third Year

PT31.- Linguistic strategies design for IR queries. Interaction queries typology between the terminological database and the ontology. Strategies based on specific phraseology and corpus concordances. Establishment of a test query corpus.

PT32.- Tests of query reelaboration using the system designed by the USC subproject. Result analysis and evaluation.

PT33.- Implementation of all the resources and the query reelaboration system on the project web portal.

USC Subproject

Own Targets

Working Packs

First Year

PT11.- Localization and adaptation of processing tools for Galician. Dictionary, morphological analyzer and disambiguation tool. Adaptation of the morphological tag systems.

PT12.- Development of the corpus on economics for Galician. Localization of electronic texts, text selection and sample partition. Structural tagging of texts by using SGML. Linguistic processing of texts.

Second Year

PT21.- Analysis of the import possibilities on economics existing ontologies. Protocol design for their import. Import tests.

PT22.- Corpus Exploitation (Galician and Spanish) aiming at enriching the terminological database and the ontology.

Third Year

PT31.- Design of a query reelaboration system, transforming a simple query in one language into a multilingual combined and complex query by means of data extraction from the terminological database and the ontology. Output of the reelaborated query to different search engines and metasearch engines.

PT32.- Test phase. Result analysis and evaluation. Participation in the completion of the web portal, accessing to the knowledge bank on economics, composed by textual corpus, terminological database and ontology and having the implementation of the query reelaboration system.

Last update: 26-06-2007