ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Wed, 20 Jan 2016 10:08:18 GMT
Hi,

I integrated your ruta scripts and added a new patch (includes and
replaces my last one).

I noticed some semantic differences between the ruta rules and their
jape originals, e.g., the brackets for the user name. Are they intended?

I needed to change some rule elements, e.g, "M.D." does not work as a
literal rule element match (very old restriction of ruta which should be
removed some day...). These literal string matches should be avoided at
all if possible, or at least the start anchor should set to a different
rule element.

Ok, I'll let you know when I start with a rule set.

Best,

Peter

Am 20.01.2016 um 01:47 schrieb Azad Dehghan:
> Peter,
>
> So, we have Email, Url, Profession, Street, Zip, State and Username
> completed so far.
>
> The following NERs remain:
> Country, Age, Doctor, Fax, Id_num, Medicalrec_num, Patient, and Phone.
>
> I will do Country next. If you are able to translated the rest quickly
> please do :) else just keep me posted which ones your are working on to
> avoid duplicate work...and we can work through the remaining NERs.
>
> Also, once the NERs are translated I will prepare a number of examples for
> unit testing -- I will also be validate the NERs using the i2b2 research
> dataset.
>
> Cheers,
> Azad
>
> On 19 January 2016 at 09:01, Peter Klügl <peter.kluegl@averbis.com> wrote:
>
>> Ok, let me know which ones I should translate.
>>
>> Best,
>>
>> Peter
>>
>> Am 18.01.2016 um 20:13 schrieb Azad Dehghan:
>>> Peter,
>>>
>>> Thanks for pushing things!
>>>
>>> I would rather split the rules/NERs to get things moving quicker (as I
>> am a
>>> newbie to Ruta). I will be uploading another NER (Username) shortly. I
>> will
>>> look at your changes to follow suit.
>>>
>>> Best,
>>> Azad
>>>
>>> On 18 January 2016 at 14:06, Peter Klügl <peter.kluegl@averbis.com>
>> wrote:
>>>> Hi,
>>>>
>>>> a new patch is attached.
>>>>
>>>> @Pei:
>>>> are there suitable annotation types in the cTAKES type system? Some
>>>> project in cTAKES uses something like OntologyMatch... I map it to
>>>> IdentifiedAnnotation right now, but there are many empty features...
>>>>
>>>> @Azad:
>>>> I changed the rules a bit, especially the capitalization like I use it
>>>> in ruta normally. The wordlist are compiled to a trie by the maven
>>>> plugin. I also added the two regexes for url and email. I extended the
>>>> regex for the url. I also changed the evaluation order of some rules
>>>> (with @). Feel free to add simple examples to examples.csv for the unit
>>>> tests.
>>>>
>>>> Let me know if you need more information about the changes.
>>>>
>>>> Do you wanna have help with the other rule sets? Or should we split them
>>>> up?
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>>>>> Hi,
>>>>>
>>>>> great. I will integrate them in the project and in the next patch.
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>>>>> Three NERs translated and uploaded.
>>>>>>
>>>>>> PS. I will validate all NERs once we have them all completed.
>>>>>>
>>>>>> Cheers,
>>>>>> Azad
>>>>>>
>>>>>> On 24 November 2015 at 10:37, Azad Dehghan <azad.dehghan@gmail.com>
>>>> wrote:
>>>>>>> This is on my todo list for Dec. as well. If there are any more
>>>> volunteers
>>>>>>> for translating JAPE to RUTA, please get in touch.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Azad
>>>>>>>
>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" <peter.kluegl@averbis.com>
>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I just wanted to mention that I haven't forgot about it.
>>>> Unfortunately,
>>>>>>>> there is just no spare time right now. I hope I will be able
to
>>>> provide
>>>>>>>> the patches in December.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>>>>>>>>> Hi Peter,
>>>>>>>>> I think the ctakes-examples is probably a good starting
point at
>>>> least
>>>>>>>>> in terms of maven modules, etc.  I think it would be
good if we use
>>>>>>>>> uimaFIT style as primary approach to wiring components
together and
>>>>>>>>> generate desc's as secondary...
>>>>>>>>> I think the actual components that would be required
is probably
>> best
>>>>>>>>> left up to what is actually required for best performing
c-deid.
>> The
>>>>>>>>> output would be interesting, I'm not sure if we should
treat this
>> as
>>>>>>>>> an independent preprocessing component or part of a pipeline
(in
>>>> which
>>>>>>>>> case, we may need to propose a change to the type system
or perhaps
>>>> an
>>>>>>>>> alternative JCas view.  You can probably open up that
discussion to
>>>>>>>>> the dev group as you see fit.)
>>>>>>>>>
>>>>>>>>> My 2 cents...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <
>>>> peter.kluegl@averbis.com>
>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Is there a cTAKES project that may serve as an example
on how the
>>>>>>> cTAKES
>>>>>>>>>> community develops or how a project should look like?
>>>>>>>>>> I learned that different people set up UIMA project
in a quite
>>>>>>> different
>>>>>>>>>> manner and I do not what to get inspired by "some
sort of
>> out-dated"
>>>>>>>>>> approach in the cTAKES repo.
>>>>>>>>>>
>>>>>>>>>> Are there restriction or preferences about the preprocessing
>>>>>>> components
>>>>>>>>>> that should be used and the kind of "output" of the
project.
>>>>>>>>>> Components: On which components may the componetns
rely:
>> tokenizer,
>>>>>>> ...
>>>>>>>>>> parser, ... dict lookup?
>>>>>>>>>> "output": Should the project provide a pipeline or
a single AE?
>>>>>>>>>>
>>>>>>>>>> More comments below.
>>>>>>>>>>
>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
>>>>>>>>>>>> Who else plans to provide patches for it?
Just to avoid
>> duplicate
>>>>>>> work
>>>>>>>>>>>> and to coordnate the efforts ...
>>>>>>>>>>>>
>>>>>>>>>>> I would like to help with the translating JAPE
to RUTA.
>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench
if you want,
>>>> or
>>>>>>>>>> wait until I set up the project with ruta integration.
>>>>>>>>>>
>>>>>>>>>> If any questions arise, just ask :-)
>>>>>>>>>>
>>>>>>>>>>>> Is there a development dataset which was
utilized for the
>> initial
>>>>>>>>>>>> development, and if yes, is it possible to
contribute it too?
>>>>>>>>>>>>
>>>>>>>>>>> The data set is unfortunately not publicly available;
i2b2
>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php>
typically releases
>>>> the
>>>>>>> data
>>>>>>>>>>> sets 12 months after a given challenge; this
is done on an
>>>>>>> individual basis
>>>>>>>>>>> and involve a Data Use Agreement.
>>>>>>>>>>>
>>>>>>>>>>> However, I will be able to conduct and coordinate
the validation.
>>>>>>>>>>>
>>>>>>>>>> Ok, I'll investigate if we have already access to
the dataset
>> here.
>>>>>>>>>>
>>>>>>>>>>>> My first step would be:
>>>>>>>>>>>> - set up a maven project
>>>>>>>>>>>> - set up a development pipeline in a test
(with cTAKES
>> components
>>>>>>>>>>>> replacing the previous ANNIE preprocessing)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> But one item that we need to review is the
3rd party libs jars
>>>> that
>>>>>>>>>>>> were included to ensure compatibility.  I’ll
be sure to take a
>>>> look
>>>>>>> at
>>>>>>>>>>>> that over the next few weeks.
>>>>>>>>>>>>
>>>>>>>>>>>> —Pei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> @Pei - once ANNIE components are replaced there
is should not be
>> a
>>>>>>> need to
>>>>>>>>>>> worry about the 3rd party libs.
>>>>>>>>>>>
>>>>>>>>>>> Also, just a thought: we may want to create an
independent
>>>> component
>>>>>>> for
>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this
method have shown
>>>>>>> useful
>>>>>>>>>>> for general NER on longitudinal data and surely
useful
>> independent
>>>>>>> of the
>>>>>>>>>>> deid component.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Azad
>>>>>>>>>>>
>>


Mime
View raw message