ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <pei.c...@wiredinformatics.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Sat, 30 Jan 2016 14:09:07 GMT
CTAKES-384-20160129.patch applied.

> On Jan 29, 2016, at 4:34 AM, Peter Klügl <peter.kluegl@averbis.com> wrote:
> 
> Hi,
> 
> the problems were caused by the svn client in my Eclipse. Sorry for the
> trouble, I should have looked more closely at the ciomplete patch.
> 
> I attached a new patch created with commandline tools wich looks correct
> now.
> 
> Pei, can you apply the new patch?
> 
> Best,
> 
> Peter
> 
> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>> Thanks Pei.
>> 
>> I fear there was again a problem with the patch. All new files are
>> missing (and also the svn-ignore settings).
>> 
>> Can you take a look?
>> 
>> Best,
>> 
>> Peter
>> 
>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
>>> patch applied.
>>> Thanks,
>>> Pei
>>> 
>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <peter.kluegl@averbis.com>
wrote:
>>>> Hi Pei,
>>>> 
>>>> can you commit the recent patch for us?
>>>> 
>>>> CTAKES-384-20160120.patch
>>>> 
>>>> Best,
>>>> 
>>>> Peter
>>>> 
>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>>>> Hi,
>>>>> Sorry I was swamped recently.
>>>>> But yeah, we can even create an extended type system to store these items
temporarily and add them into the main/core type system afterwards.
>>>>> There was an existing item to upgrade UIMA, but agreed- it will require
much more testing.  If it works, we can upgrade it in our sandbox area or create a branch
if necessary.
>>>>> 
>>>>> —Pei
>>>>> 
>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl <peter.kluegl@averbis.com>
wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> a new patch is attached.
>>>>>> 
>>>>>> @Pei:
>>>>>> are there suitable annotation types in the cTAKES type system? Some
>>>>>> project in cTAKES uses something like OntologyMatch... I map it to
>>>>>> IdentifiedAnnotation right now, but there are many empty features...
>>>>>> 
>>>>>> @Azad:
>>>>>> I changed the rules a bit, especially the capitalization like I use
it
>>>>>> in ruta normally. The wordlist are compiled to a trie by the maven
>>>>>> plugin. I also added the two regexes for url and email. I extended
the
>>>>>> regex for the url. I also changed the evaluation order of some rules
>>>>>> (with @). Feel free to add simple examples to examples.csv for the
unit
>>>>>> tests.
>>>>>> 
>>>>>> Let me know if you need more information about the changes.
>>>>>> 
>>>>>> Do you wanna have help with the other rule sets? Or should we split
them up?
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Peter
>>>>>> 
>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> great. I will integrate them in the project and in the next patch.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Peter
>>>>>>> 
>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>>>>>>> Three NERs translated and uploaded.
>>>>>>>> 
>>>>>>>> PS. I will validate all NERs once we have them all completed.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Azad
>>>>>>>> 
>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan <azad.dehghan@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>>> This is on my todo list for Dec. as well. If there are
any more volunteers
>>>>>>>>> for translating JAPE to RUTA, please get in touch.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Azad
>>>>>>>>> 
>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" <peter.kluegl@averbis.com>
wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I just wanted to mention that I haven't forgot about
it. Unfortunately,
>>>>>>>>>> there is just no spare time right now. I hope I will
be able to provide
>>>>>>>>>> the patches in December.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> 
>>>>>>>>>> Peter
>>>>>>>>>> 
>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>> I think the ctakes-examples is probably a good
starting point at least
>>>>>>>>>>> in terms of maven modules, etc.  I think it would
be good if we use
>>>>>>>>>>> uimaFIT style as primary approach to wiring components
together and
>>>>>>>>>>> generate desc's as secondary...
>>>>>>>>>>> I think the actual components that would be required
is probably best
>>>>>>>>>>> left up to what is actually required for best
performing c-deid.  The
>>>>>>>>>>> output would be interesting, I'm not sure if
we should treat this as
>>>>>>>>>>> an independent preprocessing component or part
of a pipeline (in which
>>>>>>>>>>> case, we may need to propose a change to the
type system or perhaps an
>>>>>>>>>>> alternative JCas view.  You can probably open
up that discussion to
>>>>>>>>>>> the dev group as you see fit.)
>>>>>>>>>>> 
>>>>>>>>>>> My 2 cents...
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl
<peter.kluegl@averbis.com>
>>>>>>>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> Is there a cTAKES project that may serve
as an example on how the
>>>>>>>>> cTAKES
>>>>>>>>>>>> community develops or how a project should
look like?
>>>>>>>>>>>> I learned that different people set up UIMA
project in a quite
>>>>>>>>> different
>>>>>>>>>>>> manner and I do not what to get inspired
by "some sort of out-dated"
>>>>>>>>>>>> approach in the cTAKES repo.
>>>>>>>>>>>> 
>>>>>>>>>>>> Are there restriction or preferences about
the preprocessing
>>>>>>>>> components
>>>>>>>>>>>> that should be used and the kind of "output"
of the project.
>>>>>>>>>>>> Components: On which components may the componetns
rely: tokenizer,
>>>>>>>>> ...
>>>>>>>>>>>> parser, ... dict lookup?
>>>>>>>>>>>> "output": Should the project provide a pipeline
or a single AE?
>>>>>>>>>>>> 
>>>>>>>>>>>> More comments below.
>>>>>>>>>>>> 
>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
>>>>>>>>>>>>>> Who else plans to provide patches
for it? Just to avoid duplicate
>>>>>>>>> work
>>>>>>>>>>>>>> and to coordnate the efforts ...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would like to help with the translating
JAPE to RUTA.
>>>>>>>>>>>> You can already go ahead with the UIMA Ruta
Workbench if you want, or
>>>>>>>>>>>> wait until I set up the project with ruta
integration.
>>>>>>>>>>>> 
>>>>>>>>>>>> If any questions arise, just ask :-)
>>>>>>>>>>>> 
>>>>>>>>>>>>>> Is there a development dataset which
was utilized for the initial
>>>>>>>>>>>>>> development, and if yes, is it possible
to contribute it too?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> The data set is unfortunately not publicly
available; i2b2
>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php>
typically releases the
>>>>>>>>> data
>>>>>>>>>>>>> sets 12 months after a given challenge;
this is done on an
>>>>>>>>> individual basis
>>>>>>>>>>>>> and involve a Data Use Agreement.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, I will be able to conduct and
coordinate the validation.
>>>>>>>>>>>>> 
>>>>>>>>>>>> Ok, I'll investigate if we have already access
to the dataset here.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>>> My first step would be:
>>>>>>>>>>>>>> - set up a maven project
>>>>>>>>>>>>>> - set up a development pipeline in
a test (with cTAKES components
>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> But one item that we need to review
is the 3rd party libs jars that
>>>>>>>>>>>>>> were included to ensure compatibility.
 I’ll be sure to take a look
>>>>>>>>> at
>>>>>>>>>>>>>> that over the next few weeks.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Pei - once ANNIE components are replaced
there is should not be a
>>>>>>>>> need to
>>>>>>>>>>>>> worry about the 3rd party libs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, just a thought: we may want to
create an independent component
>>>>>>>>> for
>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java)
as this method have shown
>>>>>>>>> useful
>>>>>>>>>>>>> for general NER on longitudinal data
and surely useful independent
>>>>>>>>> of the
>>>>>>>>>>>>> deid component.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Azad
>>>>>>>>>>>>> 
> 


Mime
View raw message