UPF-UPV Subproject
Own Targets
- Economic Corpus Exploitation in Catalan.
- Economic Corpus Creation in Basque.
- Adaptation of Existing Processing Tools for Basque.
- Localization and Evaluation of Ontologies and Thesauri on Economics to be re-used.
- Parallelization of Translation Corpora for English-Spanish, Catalan-English and Spanish-Catalan.
- Creation of an Ontology and/or Data Import from Existing Ontologies.
- Design and Construction of a Multilingual Terminological Database.
- Input of Terminological Data (formal, semantic and phraseological) from the Exploitation of Spanish, Catalan and Basque Corpora.
Working Packs
First Year
PT11.- Development of economic corpus for Basque. Electronic text localization, text selection and sample partition. Text processing. Incorporation to the IULA's Technical Corpus available through BWANANET.
- Expected Result: finished and accessible linguistic resource.
- Location: Constitution of the Economics corpus in Basque at UPV (San Sebastián); and structural standard tagging and linguistic processing at IULA-UPF (Barcelona).
- Coordination: Dr. Zabala.
- Participants: Dr. Odriozola, Dr. Bach., Dr. Elorduy.
- Collaborators: collaboration fellowship holder.
- External Consultancy: Economics professors at UPV. Expected mobility: two meetings (project preparation at Barcelona, subproject monitoring at San Sebastián).
PT12. – Exploitation of the corpus on economics in Spanish and Catalan. Information extraction from the processed corpus of lexical units, frequencies, concordances, etc. using BWANANET. Parallelization with ALINEA of the part of the corpus containing translated texts.
- Expected Result: Reports and linguistic data representation.
- Location: IULA-UPF (Barcelona). Includes postdoctoral research stay by Dr. Vangehuchten at IULA.
- Coordination: Dr. Vangehuchten.
- Participants: Dr. Mercè Lorente Dr. Lluís de Yzaguirre, Dr. Tebé.
- Collaborators: Mrs. Joan, Mr. Quiroz.
PT13.- Licensing of computational dictionaries and morphological analyzers already existing in Basque. Adaptation of the tools licensed during the working process. Adaptation of the morphological tagging systems.
- Expected Result: Tools licensed, adapted and accessible.
- Location: UPV (San Sebastián) and UPF (Barcelona).
- Coordination: Dr. Odriozola.
- Participants at PT3: Dr. Bach, Dr. Lluís de Yzaguirre, Dr. Zabala, Dr. Elorduy.
- External Consultancy: Group IXA from UPV.
PT14.- Localization of documental thesauri, ontologies, lexical hierarchies already existing and re-usable with economics information. Evaluation of contents and import possibilities. Localization of electronic glossaries on economics.
- Expected Result: Report.
- Location: IULA-UPF (Barcelona).
- Coordination: Dr. Mercè Lorente.
- Participants at PT: Dr. Vangehuchten, Dr. Tebé.
- Collaborators: Mrs. Arano, Mrs. Joan.
- External Consultancy: Group DigiDoc from IULA, directed by Dr. Lluís Codina; researchers from TEXTERM-2 project (BFF2003-02111) experienced in ontologies.
PT15.- Design and constitution of resources. Import of dictionaries on economics to the MERCEDES system. Design of the ontology and the terminological database associated using Ontoterm. Design of the project web page. Transfer protocols between database managers.
- Expected Result: Working protocols for the ontology and terminological database constitution.
- Location: IULA-UPF (Barcelona).
- Coordination: Dr. Lorente.
- Participants at PT2: Dr. Lluís de Yzaguirre, Dr. Tebé.
- Collaborators: Mrs. Arano, Mrs. Joan, Mr. Quiroz.
- External Consultancy: Researchers from TEXTERM project (BFF2000-0841) experienced in design and constitution of the Human Genome Bank of Knowledge.
Second Year
PT21.- Enrichment of lexical resources. Predicate database (verbs, adjectives and nominalizations) of economics with semantic and phraseological information to add into the processing dictionaries of Spanish, Catalan and Basque. Data incorporation, by import or ad hoc, into the multilingual terminological database.
- Expected Result: Finished linguistic resources.
- Distributed Location: IULA-UPF (Barcelona), UPV, Anvers University.
- Coordination: Dr. Zabala.
- Participants at PT2: Dr. Lorente, Dr. Vangehuchten, Dr. Odriozola, Dr. Elorduy, Dr. Bach, Dr. Tebé.
- Collaborators: Mrs. Joan, Mr. Quiroz, a collaboration fellowship holder.
- External Consultancy: Research Group experienced in lexicon enrichment by means of semantic data: CLIPS project from Istituto di Lingüística Computazionale in Pisa, directed by Dr. Nicoletta Calzolari and coordinated by Dr. Nilda Ruimy.
PT22.- Ontology building. Revision of the imports of re-usable ontologies to the base ontology. Addition of concept systems from data results of PT12.
- Expected Result: Finished linguistic resources.
- Distributed Location: IULA-UPF (Barcelona), UPV, Anvers University.
- Coordination: Dr. Lorente.
- Participants at PT2: Dr. Zabala, Dr. Vangehuchten, Dr. Odriozola, Dr. Elorduy, Dr. Bach, Dr. Tebé.
- Collaborators: Mrs. Arano, Mrs. Joan, a collaboration fellowship holder.
- External Consultancy: Dr. Antonio Moreno from Universidad de Málaga, creator of Ontoterm manager. Researchers from the TEXTERM project (BFF2000-0841) experienced in the design and construction of the Human Genome Knowledge Bank.
Third Year
PT31.- Linguistic strategies design for IR queries. Interaction queries typology between the terminological database and the ontology. Strategies based on specific phraseology and corpus concordances. Establishment of a test query corpus.
- Expected Result: Finished linguistic resources.
- Distributed Location: IULA-UPF (Barcelona), UPV, Anvers University.
- Coordination: Dr. Lorente.
- Participants at PT2: Dr. Zabala, Dr. Vangehuchten, Dr. Odriozola, Dr. Elorduy, Dr. Lluís de Yzaguirre, Dr. Bach, Dr. Tebé.
- Collaborators: Mrs. Joan, collaboration fellowship holder.
- External Consultancy: Research Group on IR of professor Ricardo Baeza Yates (Universidad de Chile), group from the TURSI project at Universitat Politècnica de València, directed by Dr. Encarna Segarra.
PT32.- Tests of query reelaboration using the system designed by the USC subproject. Result analysis and evaluation.
- Expected Result: Evaluation Report.
- Distributed Location: IULA-UPF (Barcelona), UPV, Anvers University.
- Coordination: Dr. Lorente.
- Participants at PT2: Dr. Zabala, Dr. Vangehuchten, Dr. Odriozola, Dr. Elorduy, Dr. Lluís de Yzaguirre, Dr. Bach, Dr. Tebé.
- Collaborators: Mrs. Arano, collaboration fellowship holder.
- Other collaborations : Universidad Politécnica de Madrid, PhD students directed by Dr. Guadalupe Aguado.
- External Consultancy: Research Group on IR of professor Ricardo Baeza Yates (Universidad de Chile), group from the TURSI project at Universitat Politècnica de València, directed by Dr. Encarna Segarra.
PT33.- Implementation of all the resources and the query reelaboration system on the project web portal.
- Expected Result: Portal on the economics knowledge bank and on the accessible query reelaboration system.
- Location: IULA-UPF
- Coordination: Dr. Lorente.
- Participants at PT2: Dr. Zabala, Dr. Vangehuchten, Dr. Odriozola, Dr. Elorduy, Dr. Lluís de Yzaguirre, Dr. Bach, Dr. Tebé.
- Collaborators: Mrs. Joan, Mrs. Arano, collaboration fellowship holder.
- External Consultancy: IULATERM group, directed by Dr. M. Teresa Cabré; DigiDoc group from IULA, directed by Dr. Lluís Codina; the group directed by Dr. Guadalupe Aguado from Universidad Politécnica de Madrid.
USC Subproject
Own Targets
- Constitution of the economics corpus for Galician.
- Adaptation of existing processing tools for Galician.
- Addition of terminological data (formal, semantic and phraseological) resulting from the exploitation of the Spanish and Galician corpora.
- Creation of the existing ontology and thesauri import protocols aiming at automatically enriching the economy ontology.
Working Packs
First Year
PT11.- Localization and adaptation of processing tools for Galician. Dictionary, morphological analyzer and disambiguation tool. Adaptation of the morphological tag systems.
- Expected Result: Adapted and functional tools.
- Coordination: Dr. María Sol López.
- Participants: Eduardo Miguel Moscoso, M.ª Paula Santalla, Susana Sotelo, Guillermo Rojo.
- Participants not being members of the group: Eva Domínguez, Fco. Mario Barcala
- External Consultancy: researchers from the Centro de investigación en Humanidades Ramón Piñeiro, Colo group from Universidad de Coruña.
PT12.- Development of the corpus on economics for Galician. Localization of electronic texts, text selection and sample partition. Structural tagging of texts by using SGML. Linguistic processing of texts.
- Expected Result: Finished and accessible linguistic resource.
- Coordination: Dr. María Sol López.
- Participants: Eduardo Miguel Moscoso, Guillermo Rojo (corpus localization and design), M.ª Paula Santalla, Susana Sotelo (structural tagging and linguistic processing).
- Participants not being members of the group: Eva Domínguez.
- External Consultancy: researchers from the Centro de investigación en Humanidades Ramón Piñeiro.
Second Year
PT21.- Analysis of the import possibilities on economics existing ontologies. Protocol design for their import. Import tests.
- Expected Result: Report. Import protocols. Evaluation.
- Coordination: Dr. María Paula Santalla.
- Participants: Susana Sotelo, Guillermo Rojo.
- Participants not being members of the group: Fco. Mario Barcala.
- Collaborators: M.ª Sol López, Eduardo Miguel Moscoso
- Consultancy: IULA-UPF, researchers from the TEXTERM project (BFF2000-0841).
PT22.- Corpus Exploitation (Galician and Spanish) aiming at enriching the terminological database and the ontology.
- Expected Result: Reports and linguistic data representation.
- Coordination: Guillermo Rojo.
- Participants: M.ª Paula Santalla, Susana Sotelo, María Sol López, Eduardo Miguel Moscoso.
- Participants not being members of the group: Eva Domínguez.
- Consultancy: IULA-UPF.
Third Year
PT31.- Design of a query reelaboration system, transforming a simple query in one language into a multilingual combined and complex query by means of data extraction from the terminological database and the ontology. Output of the reelaborated query to different search engines and metasearch engines.
- Expected Result: query reelaboration system (beta version).
- Coordination: Dr. María Paula Santalla.
- Participants: Susana Sotelo.
- Participants not being members of the group: Fco Mario Barcala, Eva Domínguez.
- Collaborators: Guillermo Rojo, María Sol López, Eduardo Miguel Moscoso.
- External Consultancy: Cole group from Universidad de Coruña.
PT32.- Test phase. Result analysis and evaluation. Participation in the completion of the web portal, accessing to the knowledge bank on economics, composed by textual corpus, terminological database and ontology and having the implementation of the query reelaboration system.
- Expected Result: Reports, web portal, refined query reelaboration system.
- Coordination: Guillermo Rojo.
- Participants: M.ª Paula Santalla, Susana Sotelo, M.ª Sol López, Eduardo Miguel Moscoso.
- Participants not being members of the group: Fco Mario Barcala, Eva Domínguez.
- External Consultancy: Cole group from Universidad de Coruña.