On-going projects
| Jugando a definir la ciencia, funded by Fundación Española para la Ciencia y la Tecnología (FCT-11-2501). Principal investigator: Dra. Rosa Estopà. Duration of the project: 2011-2012. | |
| Distancia lingüística entre los ejes espacial y temporal: aspectos fonológicos y morfológicos del catalán, funded by the Spanish Department of Innovation and Science (FFI2010-22181-C03-03). Principal investigator: Dr. Esteve Clua. Duration of the project: 2011-2013. | |
![]() |
Updating processes of the Spanish lexicon from the press of Catalonia (APLE), funded by the Spanish Department of Innovation and Science (FFI2009-12188-C05-01). Principal investigator: Dr. M. Teresa Cabré Castellví. Duration of the project: 2010-2012.The aim of this project is to provide reliable data on the lexical trends observed in Spanish and its different varieties in order to advance in the analysis of the dynamics of the Spanish lexicon and contribute to the adoption of criteria for its updating in specialized domains. |
![]() |
CLARA Initial Training Network for Common Language Resources and their Applications, funded by the Marie Curie Initial Training Network (7FP-ITN-238405), of the EU 7th Framework Programme. Principal investigator: Dr. Núria Bel. Duration of the project: 2009-2013. The goal of the CLARA network is to launch the training of a new generation of experts in linguistics that can develop methods of research for the construction, the use and the applications of language resources. The scientific objectives of CLARA are to go in greater depth into the creation of linguistic models based on real data, which are then analysed with statistical and machine-learning tools, and on the hybridisation of techniques and methods of analysis. CLARA will fund a total of 17 training grants in different areas related to the creation, the use, and the applications of language resources. The calls will be made public on the web page of the project and in Euraxess. |
![]() |
PANACEA: Platform for the Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies, funded by the Language Technologies Area, Information and Communication Technologies, of the 7th Framework Programme (7FP-ITC-248064), of the EU 7th Framework Programme. Principal investigator: Dr. Núria Bel. Duration of the project: 2010-2012. PANACEA’s goal is to develop technologies for the automation of all the stages involved in the acquisition, production, updating, validation and maintenance of Linguistic Technologies and Resources. The project coordinated by our group, counts on the participation of Cambridge University, the Istituto di Linguistica Computazionale, Italy, the Institute for Language and Speech Processing, Greece, Dublin City University, Ireland, and two companies, the German Linguatec and the French ELDA, Evaluation and Language Resources Distribution Agency. |
![]() |
Knowledge Database on Human Genome. Database in progress, developed in the framework of the two mentioned projects. The knowledge database is thought as a modular structure composed by the following parts: a) A written corpus: made of by multilingual specialized texts on human genome, tagged according to the SGML standard, preprocessed, lemmatized, morphosyntactically parsed, and disambiguated. b) A documentary and factographic database: composed, on the one hand, by the bibliografic data made of from the written corpus, references from the terminology database and other references selected by the subject-field experts; on the other hand, by data from people, institutions, enterprises, products, and methods involved within the subject field. c) Terminology database: the data structure and the user's protocol were built in 2002, and the first data has been entered from the written corpus. d) Ontology: using Ontoterm© management system; the first concepts has been entered from the written corpus and the terminology database. |
Completed projects
![]() |
Fostering Language Resources Network (Flarenet), funded by the e-contentplus program of the European Union (ECP-2007-LANG-617001), is a networking organization whose aims are devising and promoting consensual recommendations concerning the future development, deployment and use of LRs. Flarenet will indicate best practices and best policies for coordinating future actions and projects. The major activities of the Network will be to survey, analyse, classify LRs and relevant standards, together with their organisational and economic models, and discuss with major stakeholders and players upon new common strategies for a capillary deployment and use of LRs in real-world products. |
![]() |
Common Language Resources and Technologies (CLARIN), funded by European Union (FP7-INFRASTRUCTURES-2007-1-212230) and by the Ministry of Education and Science (CAC-2007-23). Principal investigator: Dra. Núria Bel. Duration of the project: 2007-2008. CLARIN has the goal of facilitate access to collections of linguistic data (texts, multimedia recordings, dictionaries, etc.) and make possible the use in the net of analysis and exploitation tools of these data based on language technologies, specially for the research in Humanities and Social Sciences. |
![]() |
Ontology Enlargement for Information Extraction from Specialised Discourses (RICOTERM3), funded by the Ministry of Education and Science (HUM2007-65966-C02-01/FILO). Principal investigator: Dr. Mercè Lorente. Duration of the project: 2007-2010 |
![]() |
Adquisición automática de información léxica (AAILE2), funded by the Ministry of Education and Science (HUM2007-61067/FILO). Principal investigator: Dra. Núria Bel. Duration of the project: 2007-2008. |
![]() |
Basis, strategies and tools for automatic extraction and processing of specialized information (TEXTERM3), funded by the Ministry of Education and Science (HUM2006-09458). Principal investigator: Dr. M. Teresa Cabré Castellví. Duration of the project: 2006-2009. |
![]() |
Discursive and terminological control for information retrieval in specialized communicative domains and for a query re-elaborator (RICOTERM2), funded by the Ministry of Education and Science (HUM2004-05658-C02-01/FILO). Principal investigator: Dra. Mercè Lorente. Duration of the project: 2004-2007 |
![]() |
Adquisición automática de información léxica (AAILE), funded by the Ministry of Education and Science (HUM2004-05111-C02-01/FILO). Principal investigator: Dra. Núria Bel. Duration of the project: 2005-2007. The goal of our research is to study the feasibility of the automatic acquisition of the information contained in computational lexicons from corpus. The methodology is by using the proposed restrictions to bias the data to be shown to the machine learning, checking the lexical representation against experimental observations coming from these Machine Learning methods that are able to identify significant classes from large amount of data. Eventually, what deserves our interest in this area is to understand the role of the syntactic and semantic constraints that operate in texts, and in the feasibility of acquiring related information. Finding how Machine Learning methods can capture them will allow us to improve both the applications aiming at automatic acquisition of lexical information as well as the representation of the lexicon itself. The precision achieved when classifying not previously seen instances will confirm that the hypothesis done in the lexicon was correct, as it proves pertinent. From the application point of view, the feasibility of reducing the time and human resources required for building large computational lexica is to support the application of linguistic technologies in knowledge management from texts, and in particular, the implementation of the semantic web independently of the language and the application domain. |
![]() |
Linguistic Infrastructure for Interoperable Resources and Systems (LIRICS), funded by e-content program of European Union (EDC-22236). Principal investigator: Dra. Núria Bel. Duration of the project: 2004-2006. The key objective of LIRICS is to provide the European content and language industries with a common an stable set of formats, in the form of ISO standards, enabling interoperability and reuse of multilingual language resources, digital content and language engineering software. |
![]() |
Multimodal AiR Quality Information Service for general public (MARQUIS), funded by e-content program of European Union. Principal investigator: Dr. Leo Wanner. Duration of the project: 2004-2006. |
![]() |
Bases, strategies and tools for the processing and extraction|retrieval automatic of specialized information (TEXTERM2), funded by the Ministry of Education and Science (BFF2003-02111). Principal investigator: Dra. M. Teresa Cabré Castellví. Duration of the project: 2003-2006. The project proposes to continue the work in the processing of the natural language from scientific-technical corpus that since 1994 the Group of research in Lexicon, Terminology and Specialized Speech (IULATERM) is carrying out and for which it has had help of the National Plan, of the Plan of the Autonomous Community and of the Union European. The goals that are intended with this project are divided up in theoretical-applied and applied-technological. More the suggested goals are precisely, in the theoretical ground orientated to the automatic extraction of information: a) the analysis of the different types of Units of Specialized Knowledge (USKs) pertinent in the structure of knowledge of the texts and b) the analysis of the units that express the relation that, in the texts, establish among the USKs. In order to elaborate technological applications, the suggested goals are, in the applied ground: a) enriching the dictionaries of processing with not flexive morphological information (internal morphological structure), syntactic (argumentative structure) and semantic (semantic characteristics of the arguments); b) developing chunker of the internal structure of the lexicon (morphological chunker); c) developing a syntactic chunker of second level (syntactic chunker); and d) to improve the representation system automatic of the structure of knowledge of a text (conceptual maps). Duration of the project: 2003-2006. Besides the members of the group IULATERM, other members of the IULA collaborate. |
| Specialized Text and Terminology: Selection and automatic retrieval of information (TEXTERM), funded by the Spanish Department of Education and Science (BFF2000-0841), had the objective of going a step ahead in discourse, grammatical and semantic analyses of specialized texts and lexical and phraseological units, in order to achieve an automatic extraction system of cognitive structures underlying specialized texts. The research was organized around four axes: text analysis and elements for tipologization; Units of specialized knowledge and concept representation; concept relations; information user needs. At the end of the proyect, a publication wil be done based on the various tecnical and descriptive reports. Project's term: 2001-2003. Besides the member of the group IULATERM, other members of IULA have taken part in this project. | |
| Information Retrieval System with Terminology and Discourse Control (RICOTERM), funded by the Spanish Department of Education and Culture (TIC-2000-1191), had the objective of designing an information retrieval system prototype that overcomes the efficiency of current systems by means of terminology control. This project, organized jointly with the TEXTERM project, was organized in three axes of applied and technological nature: mapping of cognitive structures; automatic and assisted generation of ontologies; and a review of information retrieval systems used in documentation. Project's term: 2001-2003. Besides the member of the group IULATERM, other members of IULA have taken part in this project, especially the Grup de Ciències de la Documentació (DIGIDOC). | |
| La terminología científico-técnica: reconocimiento, análisis y extracción de información formal y semántica, funded by the Spanish Department of Education and Culture (DGES PB96-0293). Project's term: 1997-2000. This project focuses on the identification of units and relations, on the automatic extraction of units and the representation units and relations. It involves several doctoral theses, some of which have been already read and some are in progress. The results of this project have been published in the Sèrie Materials del IULA. | |
| Projecte Lèxic, seguretat i salut laborals, lexicographical project carried out jointly with the Departament de Treball de la Generalitat de Catalunya, and made from a renewed terminology methodology based on the arguments exposed by communicative theory of terminology. The results of this project have been published in co-edition with the Departament de Treball of the Generalitat de Catalunya. | |
| Morfological Configuration and Argument Structure: lexis and dictionary, funded by the Spanish Department of Education and Culture (DGICYT PB93-0546-C04). Interuniversitary project, finished in 1997, had the objetive of describing systematically the processes of word formation for Spanish, Catalan, and Basque, with especial attention on the interrelation between morphology, semantics and syntax. The results of this project have been published in co-edition with the Universidad del País Vasco. | |
| The Catàleg de diccionaris catalans is an open project for compiling and describing the lexicographical products published in Catalan thata continues the publications made by M. Teresa Cabré and Mercè Lorente (1990), and Els diccionaris catalans (1940-1988). | |
| ELC-DICTIONARIES, funded by the European Union, Thematic Network Project in the Area of Languages (29517-CP-1-96-DE-ERASMUS-ETN). Project finished, which objetive was the compilation of the existing lexicographic resources in each state member of the European network. In 1999, IULATERM was encharged of writing a report about the teaching of dictionary usage in Catalan universities, presented at the University of Exeter (Great Britain). | |
| Lexicografic Workstation Prototype (ETL). Project developed jointly with researchers of the Institut d'Estudis Catalans and the Universitat Autònoma de Barcelona, funded by the CREL. It consisted in the design of an integrated workstation prototype for making dictionaries. The first version (v. 1.0) was thought to design Catalan monolingual dictionaries. In future versions, the prototype will include other types of dictionaries and all the phases of the lexicographical work. Since 2002, and from a grant given to the group IULATERM and the editorial company SPES (2001 FIT-070000-2001-677), the Spanish version has been developed. | |
| Automatic Validator for Translations (for parallel texts in Romance languages), funded by the Spanish Department of Industry and Energy (Program ATYCA, TS170/1999) for the period 1999-2000. This project had the purpouse of building an bilingual alignment tool for Spanish and Catalan law texts in order to validate translations, and it is aldo related to developed a thesis on specialized discourse. The project, finished in this phase, will be further developed on other Romance language pairs as well as on tipologically different languages. This project was made jointly with the Laboratori de Tècniques Lingüístiques. | |
| RITerm-BD2, funded by the Unión Latina, in the framework of the projects of the network RITerm. Project carried out during 1999-2000, has the objetive of establishing a system for exchanging and adapting the formats for the terminology databases of the members of RITerm. From the results obtained in a previous project, RITerm-BD, in which the infrastructure and the teaching in terminology of the RITerm member were analyzed, this project had the objetive of implement a format for transferring the terminology data among some members. This project was jointly carried out, besides IULATERM and Unión Latina, the Universidad de Antioquia (Colombia), URUTERM (Uruguay), El Colegio de México (Mexico) and the Instituto de Linguística Teórica e Computacional de la Universidade de Lisboa (Portugal). | |
![]() |
Terminology Database UPF_TERM, funded by the the Direcció General de Política Lingüística, in the framework of the projects of Normalització Lingüística 2001, and the the Pla de mesures de suport a la innovació i la qualitat docents 2001 of Universitat Pompeu Fabra. The objetive was the building of an electronic resource for the search and spreading of the terminology works carried out by the students of the School of Translation and Interpreting, IULA, and other academic and research centers at the PFU. This databases, first built with databases made with the program Multiterm 95+, has been formatted and converted to WebTerm 5.0. |
| RITerm-BD, funded by the Unión Latina, in the framework of the Network RITerm. Project finished in 1996, which objetive was the analysis of the infrastructure and teaching in terminology within the members of RITerm. The project results are a series of reports. | |
POINTER (Proposals for an Operational Infrastructure for Terminology in Europe), sponsored by European Union, in the framework of MLAP'94 (Multilingual Action Plan) and coordinated by the University of Surrey. This project, finished in 1995, had the objetive of detecting the needs and evaluting the terminology infrastructure in Europe. IULATERM participated in the subproject of terminology training, the results of which can be found in the POINTER Technical Reports (1995). |
© INSTITUT UNIVERSITARI DE LINGÜÍSTICA APLICADA - UNIVERSITAT POMPEU FABRA, Roc Boronat 138, 08018 Barcelona