ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Costumero Moreno <roberto.costum...@upm.es>
Subject Re: cTAKES Translation
Date Tue, 19 Nov 2013 16:32:49 GMT
Hi Pei,

Thank you very much for your answer.

I am looking for good corpuses and thinking about a new one with my group to train the ML-based
models and I will look into the hard-coded rules in order to change them.

AFAIK, the UMLS has a subset of the terms translated into Spanish which are correlated to
the ones on the Spanish version of SNOMED CT.

I will be sharing my doubts as well as my progress here in order to get cTAKES working in
Spanish and hopefully other languages.

Cheers,

--
Roberto Costumero Moreno
Laboratorio de Minería de Datos y Simulación (MIDAS)
Centro de Tecnología Biomédica
Universidad Politecnica de Madrid
roberto.costumero@upm.es
Tlf: +34 91 336 4664

El 15/11/2013, a las 14:49, Chen, Pei <Pei.Chen@childrens.harvard.edu> escribió:

> Hi Roberto,
> Welcome!  
> 
> In theory, in order to have cTAKES work in a different language, we would just need to:
> -Retrain the existing ML-based models for the language and code should just work as is
for
> -Update any hard-coded rules
> -Use the Spanish dictionary for concepts (I believe UMLS already has a Spanish translation
for some of their thesauruses).
> I think it would awesome to have cTAKES work with multiple languages including Spanish!
> Actually, a lot of folks have been asking about cTAKES models in different languages.
> The challenging thing with the supervised machine learning methods is that we'll have
to rely on local domain experts to create the gold standard for training.
> There is a group that may be contributing retrained models for cTAKES to work in French.
> Others can feel free to chime in...
> 
> --Pei
> 
>> -----Original Message-----
>> From: Roberto Costumero Moreno [mailto:roberto.costumero@upm.es]
>> Sent: Thursday, November 14, 2013 5:43 AM
>> To: dev@ctakes.apache.org
>> Subject: cTAKES Translation
>> 
>> Hello everyone,
>> 
>> My name is Roberto Costumero and I am working for the Technical University
>> of Madrid in Spain doing my Ph.D. studies and I am new to this list, so I am
>> introducing myself and posting some doubts I have.
>> 
>> We are currently involved in a project together with several hospitals and we
>> are working closely with them into getting to know their necessities in order
>> to build an application for them to use the knowledge of their clinical notes,
>> imaging among other things.
>> 
>> We have been looking for different projects to see which one will fits our
>> needs and, of course, which will we will share our investigations with. Among
>> the different projects we have seen in the field of clinical text analysis we
>> think that cTAKES is the best one out there and it is very well structured and
>> organized, but the main problem we are facing is that every clinical text-
>> based NLP project is developed for English and we will be working with
>> Spanish texts.
>> 
>> We have already done some work for testing different algorithms translating
>> them to Spanish to detect negation and context dependency but we would
>> like to use a well-tested complete framework to work with, so we thought
>> about cTAKES, so I have a couple of questions for you.
>> 
>> - Does anyone know if someone is already working in translating cTAKES
>> modules to work with other languages (Spanish in particular)?
>> - Do you think it would be very difficult to do it because of any architectural
>> design I am not currently aware of?
>> - Do you think it would be a good line of development (for the cTAKES
>> project) to extend cTAKES to work together into translating it to Spanish in
>> this case?
>> 
>> Thank you very much in advance for your help.
>> 
>> Sincerely,
>> 
>> --
>> Roberto Costumero Moreno
>> Laboratorio de Minería de Datos y Simulación (MIDAS) Centro de Tecnología
>> Biomédica Universidad Politecnica de Madrid roberto.costumero@upm.es
>> Tlf: +34 91 336 4664
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message