ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Mon, 18 Jan 2016 14:06:16 GMT

a new patch is attached.

are there suitable annotation types in the cTAKES type system? Some
project in cTAKES uses something like OntologyMatch... I map it to
IdentifiedAnnotation right now, but there are many empty features...

I changed the rules a bit, especially the capitalization like I use it
in ruta normally. The wordlist are compiled to a trie by the maven
plugin. I also added the two regexes for url and email. I extended the
regex for the url. I also changed the evaluation order of some rules
(with @). Feel free to add simple examples to examples.csv for the unit

Let me know if you need more information about the changes.

Do you wanna have help with the other rule sets? Or should we split them up?



Am 18.01.2016 um 11:04 schrieb Peter Klügl:
> Hi,
> great. I will integrate them in the project and in the next patch.
> Best,
> Peter
> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>> Three NERs translated and uploaded.
>> PS. I will validate all NERs once we have them all completed.
>> Cheers,
>> Azad
>> On 24 November 2015 at 10:37, Azad Dehghan <azad.dehghan@gmail.com> wrote:
>>> This is on my todo list for Dec. as well. If there are any more volunteers
>>> for translating JAPE to RUTA, please get in touch.
>>> Cheers,
>>> Azad
>>> On 24 Nov 2015 09:55, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
>>>> Hi,
>>>> I just wanted to mention that I haven't forgot about it. Unfortunately,
>>>> there is just no spare time right now. I hope I will be able to provide
>>>> the patches in December.
>>>> Best,
>>>> Peter
>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>>>>> Hi Peter,
>>>>> I think the ctakes-examples is probably a good starting point at least
>>>>> in terms of maven modules, etc.  I think it would be good if we use
>>>>> uimaFIT style as primary approach to wiring components together and
>>>>> generate desc's as secondary...
>>>>> I think the actual components that would be required is probably best
>>>>> left up to what is actually required for best performing c-deid.  The
>>>>> output would be interesting, I'm not sure if we should treat this as
>>>>> an independent preprocessing component or part of a pipeline (in which
>>>>> case, we may need to propose a change to the type system or perhaps an
>>>>> alternative JCas view.  You can probably open up that discussion to
>>>>> the dev group as you see fit.)
>>>>> My 2 cents...
>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <peter.kluegl@averbis.com>
>>> wrote:
>>>>>> Hi,
>>>>>> Is there a cTAKES project that may serve as an example on how the
>>> cTAKES
>>>>>> community develops or how a project should look like?
>>>>>> I learned that different people set up UIMA project in a quite
>>> different
>>>>>> manner and I do not what to get inspired by "some sort of out-dated"
>>>>>> approach in the cTAKES repo.
>>>>>> Are there restriction or preferences about the preprocessing
>>> components
>>>>>> that should be used and the kind of "output" of the project.
>>>>>> Components: On which components may the componetns rely: tokenizer,
>>> ...
>>>>>> parser, ... dict lookup?
>>>>>> "output": Should the project provide a pipeline or a single AE?
>>>>>> More comments below.
>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
>>>>>>>> Who else plans to provide patches for it? Just to avoid duplicate
>>> work
>>>>>>>> and to coordnate the efforts ...
>>>>>>> I would like to help with the translating JAPE to RUTA.
>>>>>> You can already go ahead with the UIMA Ruta Workbench if you want,
>>>>>> wait until I set up the project with ruta integration.
>>>>>> If any questions arise, just ask :-)
>>>>>>>> Is there a development dataset which was utilized for the
>>>>>>>> development, and if yes, is it possible to contribute it
>>>>>>> The data set is unfortunately not publicly available; i2b2
>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically
releases the
>>> data
>>>>>>> sets 12 months after a given challenge; this is done on an
>>> individual basis
>>>>>>> and involve a Data Use Agreement.
>>>>>>> However, I will be able to conduct and coordinate the validation.
>>>>>> Ok, I'll investigate if we have already access to the dataset here.
>>>>>>>> My first step would be:
>>>>>>>> - set up a maven project
>>>>>>>> - set up a development pipeline in a test (with cTAKES components
>>>>>>>> replacing the previous ANNIE preprocessing)
>>>>>>>> But one item that we need to review is the 3rd party libs
jars that
>>>>>>>> were included to ensure compatibility.  I’ll be sure to
take a look
>>> at
>>>>>>>> that over the next few weeks.
>>>>>>>> —Pei
>>>>>>> @Pei - once ANNIE components are replaced there is should not
be a
>>> need to
>>>>>>> worry about the 3rd party libs.
>>>>>>> Also, just a thought: we may want to create an independent component
>>> for
>>>>>>> the Two Pass recognition (TwoPass.java) as this method have shown
>>> useful
>>>>>>> for general NER on longitudinal data and surely useful independent
>>> of the
>>>>>>> deid component.
>>>>>>> Cheers,
>>>>>>> Azad

View raw message