ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Fri, 11 Mar 2016 15:37:58 GMT
Hi Pei,

the content of the new files is duplicated again, e.g., see
I2B2Evaluation.java

No idea what caused that...

Best,

Peter

Am 11.03.2016 um 11:32 schrieb Peter Klügl:
> Hi,
>
> thanks for the notes and links, Andy and Guergana. The software and
> articles are very interesting, but, as for my personal interest, we have
> our own clinical deidentification software solution at our company
> (which works good enough as far as I know). My focus is rather on
> helping out in translating the contribution from GATE/JAPE to UIMA/Ruta.
> Thus, I concentrate on the existing functionality for now.
>
> What is the final goal of the cTAKES comunity concerning clinical deid
> components? Will both sandbox projects be merged, what about statistical
> approaches?
>
> @Pei: there was again a problem with the patch (I also missed to add
> some files). I attached a new one.
>
> @Azad: I am just curious on which data the rules exactly rely. I think
> I'll find the information in the article.
> I assume that the 521 docuemnts have been utilized to develop the rules
> and the 269 documents to evaluate them. Did you correct the rules also
> using the second set? I need to reread to article :-)
>
> Best,
>
> Peter
>
>
> Am 10.03.2016 um 23:22 schrieb andy mcmurry:
>> *** For cross-validation, you can evaluate de-identified notes data from
>> i2b2 challenge** *
>> https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/data/models/
>>
>> *Methods for model generation of FeatureSet described here: *
>>
>> *Improved de-identification of physician notes through integrative modeling
>> of both public and private medical text*
>> http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-112
>>
>> Major objective of that study was to help provide external examples to
>> cross train / retrain other methods.
>>
>> hope this helps,
>> --Andy
>>
>>
>>
>> On Thu, Mar 10, 2016 at 1:27 PM, Savova, Guergana <
>> Guergana.Savova@childrens.harvard.edu> wrote:
>>
>>> You can re-build the models that feed into MIST. I personally would not
>>> use the default model that MIST comes with as it is not trained on clinical
>>> data. In our previous work we found that hand-annotating about 200 docs for
>>> PHI (representative of the sample you are going to run the models on)
>>> results in building a pretty good model - in the 90's for p, r and f1.
>>> However, even with that high performance, the institution that owns the
>>> data might be still reluctant to share as it might pose a violation of
>>> HIPAA through some potential PHI leaks. In cTAKES our approach has been to
>>> de-couple the de-identifcation from the NLP/information extraction. If a
>>> user has the need for de-identified data, they could choose their method --
>>> manual or otherwise -- and then process through cTAKES. Our focus is the
>>> NLP/IE space, while de-identification is a blend of that plus policy....
>>>
>>> --Guergana
>>>
>>> -----Original Message-----
>>> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
>>> Sent: Thursday, March 10, 2016 4:19 PM
>>> To: dev@ctakes.apache.org
>>> Subject: RE: Combining Knowledge- and Data-driven Methods for
>>> De-identification of Clinical Narratives
>>>
>>> Thanks Guergana.
>>>
>>>> Yes, the current release of cTAKES has a module for the temporal
>>> expressions which includes dates. The normalizer for the temporal
>>> expressions is Steven Bethard's timenorm code.
>>> Great.
>>>
>>>> However, if you do de-identification of dates/temporal expressions,
>>>> you
>>> run the risk of creating incorrect timelines as many of the relative
>>> temporal expressions (e.g. spring of this year, x-mas time, etc.) are
>>> unlikely to be correctly shifted by any de-identification tool.
>>> Indeed, a reason I have not included the dates component.
>>>
>>>> One de-identification tool is MIST --
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
>>> .
>>> I don't remember them doing well in the community held evaluation in 2014.
>>> Hence, cDeid :)
>>>> Guergana Savova, PhD, FACMI
>>>> Associate Professor
>>>> PI Natural Language Processing Lab
>>>> Boston Children's Hospital and Harvard Medical School
>>>> 300 Longwood Avenue
>>>> Mailstop: BCH3092
>>>> Enders 144.1
>>>> Boston, MA 02115
>>>> Tel: (617) 919-2972
>>>> Fax: (617) 730-0817
>>>> Harvard Scholar:
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
>>>> u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>>>> ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
>>>> RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
>>>> iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
>>>>
>>>> -----Original Message-----
>>>> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
>>>> Sent: Thursday, March 10, 2016 3:42 PM
>>>> To: dev@ctakes.apache.org
>>>> Subject: Re: Combining Knowledge- and Data-driven Methods for
>>> De-identification of Clinical Narratives
>>>>> This means both training data folders? I have access to the data but
>>>>> not
>>>> to the challenge description.
>>>>
>>>> Yes. Is there any specific information that you are missing?
>>>>>> It would be good to incorporate/refactor (basically, GATE API needs
>>>>>> to be replaced with UIMA API to generate annotation) the two-pass
>>>>>> recognition method for cTAKES - which has a wider application on
>>> longitudinal data.
>>>>>> This method is used on-top of a number NERs.
>>>>> I'll take a look.
>>>>>
>>>>> I do not know how much time I can invest this month. Let's see how
>>>>> many
>>>> phases I can translate.
>>>>> I added the rules for age. Are there jape rules for creating date
>>>> annotations?
>>>> No. I believe cTAKES has existing component(s) to capture dates?
>>>>
>>>>> After all rules are translated, they need some major refactoring.
>>>>> Jape
>>>> and Ruta are quite different in some aspects.
>>>> Ok.
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Please let me know where I can help. I will be available again in
>>> April.
>>>>>> Cheers,
>>>>>> Azad
>>>>>>
>>>>>> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com>
>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> sorry, I was quite busy last month.
>>>>>>>
>>>>>>> I added a new patch, which needs to be applied.
>>>>>>>
>>>>>>> No new rules, but it's possible now to evaluate everything against
>>>>>>> the labelled data of the challenge.
>>>>>>>
>>>>>>> @Azad:
>>>>>>> Which documents exactly did you use to develop the rules?
>>>>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
>>>> testing-PHI-Gold-fixed?
>>>>>>> Best,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> the last patch fixed almost all problems.
>>>>>>>>
>>>>>>>> I added another one that adds the csv file for the unit test
and
>>>> extends
>>>>>>>> svn-ignore.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I added another patch. I missed to manually add one test
file to
>>>> version
>>>>>>>>> control, and there are still duplicate lines.
>>>>>>>>> I hope this patch fixes the remaining problems.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> the problems were caused by the svn client in my
Eclipse. Sorry
>>>>>>>>>> for
>>>> the
>>>>>>>>>> trouble, I should have looked more closely at the
ciomplete patch.
>>>>>>>>>>
>>>>>>>>>> I attached a new patch created with commandline tools
wich
>>>>>>>>>> looks
>>>>>>> correct
>>>>>>>>>> now.
>>>>>>>>>>
>>>>>>>>>> Pei, can you apply the new patch?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>>>>>>>>>>> Thanks Pei.
>>>>>>>>>>>
>>>>>>>>>>> I fear there was again a problem with the patch.
All new files
>>>>>>>>>>> are missing (and also the svn-ignore settings).
>>>>>>>>>>>
>>>>>>>>>>> Can you take a look?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
>>>>>>>>>>>> patch applied.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Pei
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl
<
>>>>>>> peter.kluegl@averbis.com> wrote:
>>>>>>>>>>>>> Hi Pei,
>>>>>>>>>>>>>
>>>>>>>>>>>>> can you commit the recent patch for us?
>>>>>>>>>>>>>
>>>>>>>>>>>>> CTAKES-384-20160120.patch
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> Sorry I was swamped recently.
>>>>>>>>>>>>>> But yeah, we can even create an extended
type system to
>>>>>>>>>>>>>> store
>>>>>>> these items temporarily and add them into the main/core type
>>>>>>> system afterwards.
>>>>>>>>>>>>>> There was an existing item to upgrade
UIMA, but agreed- it
>>>>>>>>>>>>>> will
>>>>>>> require much more testing.  If it works, we can upgrade it in
our
>>>> sandbox
>>>>>>> area or create a branch if necessary.
>>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM,
Peter Klügl <
>>>>>>> peter.kluegl@averbis.com> wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a new patch is attached.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @Pei:
>>>>>>>>>>>>>>> are there suitable annotation
types in the cTAKES type
>>> system?
>>>>>>> Some
>>>>>>>>>>>>>>> project in cTAKES uses something
like OntologyMatch... I
>>>>>>>>>>>>>>> map it
>>>> to
>>>>>>>>>>>>>>> IdentifiedAnnotation right now,
but there are many empty
>>>>>>> features...
>>>>>>>>>>>>>>> @Azad:
>>>>>>>>>>>>>>> I changed the rules a bit, especially
the capitalization
>>>>>>>>>>>>>>> like I
>>>>>>> use it
>>>>>>>>>>>>>>> in ruta normally. The wordlist
are compiled to a trie by
>>>>>>>>>>>>>>> the
>>>> maven
>>>>>>>>>>>>>>> plugin. I also added the two
regexes for url and email. I
>>>>>>> extended the
>>>>>>>>>>>>>>> regex for the url. I also changed
the evaluation order of
>>>>>>>>>>>>>>> some
>>>>>>> rules
>>>>>>>>>>>>>>> (with @). Feel free to add simple
examples to examples.csv
>>>>>>>>>>>>>>> for
>>>>>>> the unit
>>>>>>>>>>>>>>> tests.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you need more
information about the changes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you wanna have help with the
other rule sets? Or should
>>>>>>>>>>>>>>> we
>>>>>>> split them up?
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb
Peter Klügl:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> great. I will integrate them
in the project and in the
>>>>>>>>>>>>>>>> next
>>>>>>> patch.
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb
Azad Dehghan:
>>>>>>>>>>>>>>>>> Three NERs translated
and uploaded.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PS. I will validate all
NERs once we have them all
>>> completed.
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 24 November 2015 at
10:37, Azad Dehghan <
>>>>>>> azad.dehghan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> This is on my todo
list for Dec. as well. If there are
>>>>>>>>>>>>>>>>>> any
>>>>>>> more volunteers
>>>>>>>>>>>>>>>>>> for translating JAPE
to RUTA, please get in touch.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 24 Nov 2015 09:55,
"Peter Klügl"
>>>>>>>>>>>>>>>>>> <peter.kluegl@averbis.com
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I just wanted
to mention that I haven't forgot about it.
>>>>>>> Unfortunately,
>>>>>>>>>>>>>>>>>>> there is just
no spare time right now. I hope I will
>>>>>>>>>>>>>>>>>>> be able
>>>>>>> to provide
>>>>>>>>>>>>>>>>>>> the patches in
December.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Am 06.11.2015
um 16:40 schrieb Pei Chen:
>>>>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>>>>> I think the
ctakes-examples is probably a good
>>>>>>>>>>>>>>>>>>>> starting
>>>>>>> point at least
>>>>>>>>>>>>>>>>>>>> in terms
of maven modules, etc.  I think it would be
>>>>>>>>>>>>>>>>>>>> good
>>>> if
>>>>>>> we use
>>>>>>>>>>>>>>>>>>>> uimaFIT style
as primary approach to wiring
>>>>>>>>>>>>>>>>>>>> components
>>>>>>> together and
>>>>>>>>>>>>>>>>>>>> generate
desc's as secondary...
>>>>>>>>>>>>>>>>>>>> I think the
actual components that would be required
>>>>>>>>>>>>>>>>>>>> is
>>>>>>> probably best
>>>>>>>>>>>>>>>>>>>> left up to
what is actually required for best
>>>>>>>>>>>>>>>>>>>> performing
>>>>>>> c-deid.  The
>>>>>>>>>>>>>>>>>>>> output would
be interesting, I'm not sure if we
>>>>>>>>>>>>>>>>>>>> should
>>>> treat
>>>>>>> this as
>>>>>>>>>>>>>>>>>>>> an independent
preprocessing component or part of a
>>>> pipeline
>>>>>>> (in which
>>>>>>>>>>>>>>>>>>>> case, we
may need to propose a change to the type
>>>>>>>>>>>>>>>>>>>> system or
>>>>>>> perhaps an
>>>>>>>>>>>>>>>>>>>> alternative
JCas view.  You can probably open up that
>>>>>>> discussion to
>>>>>>>>>>>>>>>>>>>> the dev group
as you see fit.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> My 2 cents...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Nov
6, 2015 at 3:38 AM, Peter Klügl <
>>>>>>> peter.kluegl@averbis.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Is there
a cTAKES project that may serve as an
>>>>>>>>>>>>>>>>>>>>> example
on
>>>>>>> how the
>>>>>>>>>>>>>>>>>> cTAKES
>>>>>>>>>>>>>>>>>>>>> community
develops or how a project should look like?
>>>>>>>>>>>>>>>>>>>>> I learned
that different people set up UIMA project
>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>> quite
>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>> manner
and I do not what to get inspired by "some
>>>>>>>>>>>>>>>>>>>>> sort
of
>>>>>>> out-dated"
>>>>>>>>>>>>>>>>>>>>> approach
in the cTAKES repo.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Are there
restriction or preferences about the
>>>> preprocessing
>>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>>>>>> that
should be used and the kind of "output" of the
>>>> project.
>>>>>>>>>>>>>>>>>>>>> Components:
On which components may the componetns
>>> rely:
>>>>>>> tokenizer,
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>> parser,
... dict lookup?
>>>>>>>>>>>>>>>>>>>>> "output":
Should the project provide a pipeline or a
>>>> single
>>>>>>> AE?
>>>>>>>>>>>>>>>>>>>>> More
comments below.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 03.11.2015
um 16:54 schrieb Azad Dehghan:
>>>>>>>>>>>>>>>>>>>>>>>
Who else plans to provide patches for it? Just to
>>>>>>>>>>>>>>>>>>>>>>>
avoid
>>>>>>> duplicate
>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>>
and to coordnate the efforts ...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I
would like to help with the translating JAPE to
>>> RUTA.
>>>>>>>>>>>>>>>>>>>>> You can
already go ahead with the UIMA Ruta
>>>>>>>>>>>>>>>>>>>>> Workbench
if
>>>>>>> you want, or
>>>>>>>>>>>>>>>>>>>>> wait
until I set up the project with ruta integration.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If any
questions arise, just ask :-)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Is there a development dataset which was utilized
>>>>>>>>>>>>>>>>>>>>>>>
for
>>>> the
>>>>>>> initial
>>>>>>>>>>>>>>>>>>>>>>>
development, and if yes, is it possible to
>>>>>>>>>>>>>>>>>>>>>>>
contribute it
>>>>>>> too?
>>>>>>>>>>>>>>>>>>>>>> The
data set is unfortunately not publicly
>>>>>>>>>>>>>>>>>>>>>> available;
>>>> i2b2
>>>>>>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3
>>>>>>>>>>>>>>>>>>>>>> A_
>>>>>>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4g
>>>>>>>>>>>>>>>>>>>>>> oW
>>>>>>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNn
>>>>>>>>>>>>>>>>>>>>>> J9
>>>>>>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP
>>>>>>>>>>>>>>>>>>>>>> &m
>>>>>>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEe
>>>>>>>>>>>>>>>>>>>>>> OR
yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e= >
>>>>>>>>>>>>>>>>>>>>>> typically
>>>>>>> releases the
>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>> sets
12 months after a given challenge; this is
>>>>>>>>>>>>>>>>>>>>>> done
on
>>>> an
>>>>>>>>>>>>>>>>>> individual basis
>>>>>>>>>>>>>>>>>>>>>> and
involve a Data Use Agreement.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However,
I will be able to conduct and coordinate
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>> validation.
>>>>>>>>>>>>>>>>>>>>> Ok, I'll
investigate if we have already access to
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>> dataset here.
>>>>>>>>>>>>>>>>>>>>>>>
My first step would be:
>>>>>>>>>>>>>>>>>>>>>>>
- set up a maven project
>>>>>>>>>>>>>>>>>>>>>>>
- set up a development pipeline in a test (with
>>>>>>>>>>>>>>>>>>>>>>>
cTAKES
>>>>>>> components
>>>>>>>>>>>>>>>>>>>>>>>
replacing the previous ANNIE preprocessing)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
But one item that we need to review is the 3rd
>>>>>>>>>>>>>>>>>>>>>>>
party
>>>> libs
>>>>>>> jars that
>>>>>>>>>>>>>>>>>>>>>>>
were included to ensure compatibility.  I’ll be
>>>>>>>>>>>>>>>>>>>>>>>
sure to
>>>>>>> take a look
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>
that over the next few weeks.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
—Pei
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> @Pei
- once ANNIE components are replaced there is
>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>> not be a
>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>>>>>> worry
about the 3rd party libs.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Also,
just a thought: we may want to create an
>>>> independent
>>>>>>> component
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> the
Two Pass recognition (TwoPass.java) as this
>>>>>>>>>>>>>>>>>>>>>> method
>>>>>>> have shown
>>>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>>>>> for
general NER on longitudinal data and surely
>>>>>>>>>>>>>>>>>>>>>> useful
>>>>>>> independent
>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>> deid
component.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>>>>>>


Mime
View raw message