ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <pei.c...@wiredinformatics.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Wed, 20 Jan 2016 18:35:43 GMT
Sorry I was swamped recently.
But yeah, we can even create an extended type system to store these items temporarily and
add them into the main/core type system afterwards.
There was an existing item to upgrade UIMA, but agreed- it will require much more testing.
 If it works, we can upgrade it in our sandbox area or create a branch if necessary.


> On Jan 18, 2016, at 9:06 AM, Peter Klügl <peter.kluegl@averbis.com> wrote:
> Hi,
> a new patch is attached.
> @Pei:
> are there suitable annotation types in the cTAKES type system? Some
> project in cTAKES uses something like OntologyMatch... I map it to
> IdentifiedAnnotation right now, but there are many empty features...
> @Azad:
> I changed the rules a bit, especially the capitalization like I use it
> in ruta normally. The wordlist are compiled to a trie by the maven
> plugin. I also added the two regexes for url and email. I extended the
> regex for the url. I also changed the evaluation order of some rules
> (with @). Feel free to add simple examples to examples.csv for the unit
> tests.
> Let me know if you need more information about the changes.
> Do you wanna have help with the other rule sets? Or should we split them up?
> Best,
> Peter
> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>> Hi,
>> great. I will integrate them in the project and in the next patch.
>> Best,
>> Peter
>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>> Three NERs translated and uploaded.
>>> PS. I will validate all NERs once we have them all completed.
>>> Cheers,
>>> Azad
>>> On 24 November 2015 at 10:37, Azad Dehghan <azad.dehghan@gmail.com> wrote:
>>>> This is on my todo list for Dec. as well. If there are any more volunteers
>>>> for translating JAPE to RUTA, please get in touch.
>>>> Cheers,
>>>> Azad
>>>> On 24 Nov 2015 09:55, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
>>>>> Hi,
>>>>> I just wanted to mention that I haven't forgot about it. Unfortunately,
>>>>> there is just no spare time right now. I hope I will be able to provide
>>>>> the patches in December.
>>>>> Best,
>>>>> Peter
>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>>>>>> Hi Peter,
>>>>>> I think the ctakes-examples is probably a good starting point at
>>>>>> in terms of maven modules, etc.  I think it would be good if we use
>>>>>> uimaFIT style as primary approach to wiring components together and
>>>>>> generate desc's as secondary...
>>>>>> I think the actual components that would be required is probably
>>>>>> left up to what is actually required for best performing c-deid.
>>>>>> output would be interesting, I'm not sure if we should treat this
>>>>>> an independent preprocessing component or part of a pipeline (in
>>>>>> case, we may need to propose a change to the type system or perhaps
>>>>>> alternative JCas view.  You can probably open up that discussion
>>>>>> the dev group as you see fit.)
>>>>>> My 2 cents...
>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <peter.kluegl@averbis.com>
>>>> wrote:
>>>>>>> Hi,
>>>>>>> Is there a cTAKES project that may serve as an example on how
>>>> cTAKES
>>>>>>> community develops or how a project should look like?
>>>>>>> I learned that different people set up UIMA project in a quite
>>>> different
>>>>>>> manner and I do not what to get inspired by "some sort of out-dated"
>>>>>>> approach in the cTAKES repo.
>>>>>>> Are there restriction or preferences about the preprocessing
>>>> components
>>>>>>> that should be used and the kind of "output" of the project.
>>>>>>> Components: On which components may the componetns rely: tokenizer,
>>>> ...
>>>>>>> parser, ... dict lookup?
>>>>>>> "output": Should the project provide a pipeline or a single AE?
>>>>>>> More comments below.
>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
>>>>>>>>> Who else plans to provide patches for it? Just to avoid
>>>> work
>>>>>>>>> and to coordnate the efforts ...
>>>>>>>> I would like to help with the translating JAPE to RUTA.
>>>>>>> You can already go ahead with the UIMA Ruta Workbench if you
want, or
>>>>>>> wait until I set up the project with ruta integration.
>>>>>>> If any questions arise, just ask :-)
>>>>>>>>> Is there a development dataset which was utilized for
the initial
>>>>>>>>> development, and if yes, is it possible to contribute
it too?
>>>>>>>> The data set is unfortunately not publicly available; i2b2
>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically
releases the
>>>> data
>>>>>>>> sets 12 months after a given challenge; this is done on an
>>>> individual basis
>>>>>>>> and involve a Data Use Agreement.
>>>>>>>> However, I will be able to conduct and coordinate the validation.
>>>>>>> Ok, I'll investigate if we have already access to the dataset
>>>>>>>>> My first step would be:
>>>>>>>>> - set up a maven project
>>>>>>>>> - set up a development pipeline in a test (with cTAKES
>>>>>>>>> replacing the previous ANNIE preprocessing)
>>>>>>>>> But one item that we need to review is the 3rd party
libs jars that
>>>>>>>>> were included to ensure compatibility.  I’ll be sure
to take a look
>>>> at
>>>>>>>>> that over the next few weeks.
>>>>>>>>> —Pei
>>>>>>>> @Pei - once ANNIE components are replaced there is should
not be a
>>>> need to
>>>>>>>> worry about the 3rd party libs.
>>>>>>>> Also, just a thought: we may want to create an independent
>>>> for
>>>>>>>> the Two Pass recognition (TwoPass.java) as this method have
>>>> useful
>>>>>>>> for general NER on longitudinal data and surely useful independent
>>>> of the
>>>>>>>> deid component.
>>>>>>>> Cheers,
>>>>>>>> Azad

View raw message