ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andy mcmurry <mcmurry.a...@gmail.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Thu, 10 Mar 2016 22:22:53 GMT
*** For cross-validation, you can evaluate de-identified notes data from
i2b2 challenge** *
https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/data/models/

*Methods for model generation of FeatureSet described here: *

*Improved de-identification of physician notes through integrative modeling
of both public and private medical text*
http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-112

Major objective of that study was to help provide external examples to
cross train / retrain other methods.

hope this helps,
--Andy



On Thu, Mar 10, 2016 at 1:27 PM, Savova, Guergana <
Guergana.Savova@childrens.harvard.edu> wrote:

> You can re-build the models that feed into MIST. I personally would not
> use the default model that MIST comes with as it is not trained on clinical
> data. In our previous work we found that hand-annotating about 200 docs for
> PHI (representative of the sample you are going to run the models on)
> results in building a pretty good model - in the 90's for p, r and f1.
> However, even with that high performance, the institution that owns the
> data might be still reluctant to share as it might pose a violation of
> HIPAA through some potential PHI leaks. In cTAKES our approach has been to
> de-couple the de-identifcation from the NLP/information extraction. If a
> user has the need for de-identified data, they could choose their method --
> manual or otherwise -- and then process through cTAKES. Our focus is the
> NLP/IE space, while de-identification is a blend of that plus policy....
>
> --Guergana
>
> -----Original Message-----
> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
> Sent: Thursday, March 10, 2016 4:19 PM
> To: dev@ctakes.apache.org
> Subject: RE: Combining Knowledge- and Data-driven Methods for
> De-identification of Clinical Narratives
>
> Thanks Guergana.
>
> > Yes, the current release of cTAKES has a module for the temporal
> expressions which includes dates. The normalizer for the temporal
> expressions is Steven Bethard's timenorm code.
> >
>
> Great.
>
> > However, if you do de-identification of dates/temporal expressions,
> > you
> run the risk of creating incorrect timelines as many of the relative
> temporal expressions (e.g. spring of this year, x-mas time, etc.) are
> unlikely to be correctly shifted by any de-identification tool.
> >
> Indeed, a reason I have not included the dates component.
>
> > One de-identification tool is MIST --
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
> .
> >
> I don't remember them doing well in the community held evaluation in 2014.
> Hence, cDeid :)
> >
> > Guergana Savova, PhD, FACMI
> > Associate Professor
> > PI Natural Language Processing Lab
> > Boston Children's Hospital and Harvard Medical School
> > 300 Longwood Avenue
> > Mailstop: BCH3092
> > Enders 144.1
> > Boston, MA 02115
> > Tel: (617) 919-2972
> > Fax: (617) 730-0817
> > Harvard Scholar:
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> > u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> > ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
> > RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
> > iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
> >
> > -----Original Message-----
> > From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
> > Sent: Thursday, March 10, 2016 3:42 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Combining Knowledge- and Data-driven Methods for
> De-identification of Clinical Narratives
> >
> > > This means both training data folders? I have access to the data but
> > > not
> > to the challenge description.
> >
> > Yes. Is there any specific information that you are missing?
> > >
> > >
> > >> It would be good to incorporate/refactor (basically, GATE API needs
> > >> to be replaced with UIMA API to generate annotation) the two-pass
> > >> recognition method for cTAKES - which has a wider application on
> longitudinal data.
> > >> This method is used on-top of a number NERs.
> > >
> > >
> > > I'll take a look.
> > >
> > > I do not know how much time I can invest this month. Let's see how
> > > many
> > phases I can translate.
> > >
> > > I added the rules for age. Are there jape rules for creating date
> > annotations?
> > >
> >
> > No. I believe cTAKES has existing component(s) to capture dates?
> >
> > > After all rules are translated, they need some major refactoring.
> > > Jape
> > and Ruta are quite different in some aspects.
> > >
> > Ok.
> >
> > >
> > >
> > >
> > >
> > >
> > >> Please let me know where I can help. I will be available again in
> April.
> > >>
> > >> Cheers,
> > >> Azad
> > >>
> > >> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com>
> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> sorry, I was quite busy last month.
> > >>>
> > >>> I added a new patch, which needs to be applied.
> > >>>
> > >>> No new rules, but it's possible now to evaluate everything against
> > >>> the labelled data of the challenge.
> > >>>
> > >>> @Azad:
> > >>> Which documents exactly did you use to develop the rules?
> > >>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> > testing-PHI-Gold-fixed?
> > >>>
> > >>> Best,
> > >>>
> > >>> Peter
> > >>>
> > >>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> the last patch fixed almost all problems.
> > >>>>
> > >>>> I added another one that adds the csv file for the unit test and
> > extends
> > >>>> svn-ignore.
> > >>>>
> > >>>> Best,
> > >>>>
> > >>>> Peter
> > >>>>
> > >>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I added another patch. I missed to manually add one test file
to
> > version
> > >>>>> control, and there are still duplicate lines.
> > >>>>> I hope this patch fixes the remaining problems.
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Peter
> > >>>>>
> > >>>>>
> > >>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> the problems were caused by the svn client in my Eclipse.
Sorry
> > >>>>>> for
> > the
> > >>>>>> trouble, I should have looked more closely at the ciomplete
patch.
> > >>>>>>
> > >>>>>> I attached a new patch created with commandline tools wich
> > >>>>>> looks
> > >>>
> > >>> correct
> > >>>>>>
> > >>>>>> now.
> > >>>>>>
> > >>>>>> Pei, can you apply the new patch?
> > >>>>>>
> > >>>>>> Best,
> > >>>>>>
> > >>>>>> Peter
> > >>>>>>
> > >>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
> > >>>>>>>
> > >>>>>>> Thanks Pei.
> > >>>>>>>
> > >>>>>>> I fear there was again a problem with the patch. All
new files
> > >>>>>>> are missing (and also the svn-ignore settings).
> > >>>>>>>
> > >>>>>>> Can you take a look?
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>>
> > >>>>>>> Peter
> > >>>>>>>
> > >>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
> > >>>>>>>>
> > >>>>>>>> patch applied.
> > >>>>>>>> Thanks,
> > >>>>>>>> Pei
> > >>>>>>>>
> > >>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
> > >>>
> > >>> peter.kluegl@averbis.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi Pei,
> > >>>>>>>>>
> > >>>>>>>>> can you commit the recent patch for us?
> > >>>>>>>>>
> > >>>>>>>>> CTAKES-384-20160120.patch
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Peter
> > >>>>>>>>>
> > >>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>> Sorry I was swamped recently.
> > >>>>>>>>>> But yeah, we can even create an extended
type system to
> > >>>>>>>>>> store
> > >>>
> > >>> these items temporarily and add them into the main/core type
> > >>> system afterwards.
> > >>>>>>>>>>
> > >>>>>>>>>> There was an existing item to upgrade UIMA,
but agreed- it
> > >>>>>>>>>> will
> > >>>
> > >>> require much more testing.  If it works, we can upgrade it in our
> > sandbox
> > >>> area or create a branch if necessary.
> > >>>>>>>>>>
> > >>>>>>>>>> —Pei
> > >>>>>>>>>>
> > >>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter
Klügl <
> > >>>
> > >>> peter.kluegl@averbis.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> a new patch is attached.
> > >>>>>>>>>>>
> > >>>>>>>>>>> @Pei:
> > >>>>>>>>>>> are there suitable annotation types
in the cTAKES type
> system?
> > >>>
> > >>> Some
> > >>>>>>>>>>>
> > >>>>>>>>>>> project in cTAKES uses something like
OntologyMatch... I
> > >>>>>>>>>>> map it
> > to
> > >>>>>>>>>>> IdentifiedAnnotation right now, but
there are many empty
> > >>>
> > >>> features...
> > >>>>>>>>>>>
> > >>>>>>>>>>> @Azad:
> > >>>>>>>>>>> I changed the rules a bit, especially
the capitalization
> > >>>>>>>>>>> like I
> > >>>
> > >>> use it
> > >>>>>>>>>>>
> > >>>>>>>>>>> in ruta normally. The wordlist are
compiled to a trie by
> > >>>>>>>>>>> the
> > maven
> > >>>>>>>>>>> plugin. I also added the two regexes
for url and email. I
> > >>>
> > >>> extended the
> > >>>>>>>>>>>
> > >>>>>>>>>>> regex for the url. I also changed the
evaluation order of
> > >>>>>>>>>>> some
> > >>>
> > >>> rules
> > >>>>>>>>>>>
> > >>>>>>>>>>> (with @). Feel free to add simple examples
to examples.csv
> > >>>>>>>>>>> for
> > >>>
> > >>> the unit
> > >>>>>>>>>>>
> > >>>>>>>>>>> tests.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Let me know if you need more information
about the changes.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Do you wanna have help with the other
rule sets? Or should
> > >>>>>>>>>>> we
> > >>>
> > >>> split them up?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Peter
> > >>>>>>>>>>>
> > >>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter
Klügl:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> great. I will integrate them in
the project and in the
> > >>>>>>>>>>>> next
> > >>>
> > >>> patch.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Peter
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb
Azad Dehghan:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Three NERs translated and uploaded.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> PS. I will validate all NERs
once we have them all
> completed.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>> Azad
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 24 November 2015 at 10:37,
Azad Dehghan <
> > >>>
> > >>> azad.dehghan@gmail.com> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> This is on my todo list
for Dec. as well. If there are
> > >>>>>>>>>>>>>> any
> > >>>
> > >>> more volunteers
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> for translating JAPE to
RUTA, please get in touch.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>> Azad
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter
Klügl"
> > >>>>>>>>>>>>>> <peter.kluegl@averbis.com
> > >
> > >>>
> > >>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I just wanted to mention
that I haven't forgot about it.
> > >>>
> > >>> Unfortunately,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> there is just no spare
time right now. I hope I will
> > >>>>>>>>>>>>>>> be able
> > >>>
> > >>> to provide
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> the patches in December.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Peter
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Am 06.11.2015 um 16:40
schrieb Pei Chen:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Peter,
> > >>>>>>>>>>>>>>>> I think the ctakes-examples
is probably a good
> > >>>>>>>>>>>>>>>> starting
> > >>>
> > >>> point at least
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> in terms of maven
modules, etc.  I think it would be
> > >>>>>>>>>>>>>>>> good
> > if
> > >>>
> > >>> we use
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> uimaFIT style as
primary approach to wiring
> > >>>>>>>>>>>>>>>> components
> > >>>
> > >>> together and
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> generate desc's
as secondary...
> > >>>>>>>>>>>>>>>> I think the actual
components that would be required
> > >>>>>>>>>>>>>>>> is
> > >>>
> > >>> probably best
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> left up to what
is actually required for best
> > >>>>>>>>>>>>>>>> performing
> > >>>
> > >>> c-deid.  The
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> output would be
interesting, I'm not sure if we
> > >>>>>>>>>>>>>>>> should
> > treat
> > >>>
> > >>> this as
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> an independent
preprocessing component or part of a
> > pipeline
> > >>>
> > >>> (in which
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> case, we may need
to propose a change to the type
> > >>>>>>>>>>>>>>>> system or
> > >>>
> > >>> perhaps an
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> alternative JCas
view.  You can probably open up that
> > >>>
> > >>> discussion to
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> the dev group as
you see fit.)
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> My 2 cents...
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Fri, Nov 6,
2015 at 3:38 AM, Peter Klügl <
> > >>>
> > >>> peter.kluegl@averbis.com>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Is there a
cTAKES project that may serve as an
> > >>>>>>>>>>>>>>>>> example on
> > >>>
> > >>> how the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> cTAKES
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> community develops
or how a project should look like?
> > >>>>>>>>>>>>>>>>> I learned that
different people set up UIMA project
> > >>>>>>>>>>>>>>>>> in a
> > >>>
> > >>> quite
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> different
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> manner and
I do not what to get inspired by "some
> > >>>>>>>>>>>>>>>>> sort of
> > >>>
> > >>> out-dated"
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> approach in
the cTAKES repo.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Are there restriction
or preferences about the
> > preprocessing
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> components
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> that should
be used and the kind of "output" of the
> > project.
> > >>>>>>>>>>>>>>>>> Components:
On which components may the componetns
> rely:
> > >>>
> > >>> tokenizer,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> ...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> parser, ...
dict lookup?
> > >>>>>>>>>>>>>>>>> "output": Should
the project provide a pipeline or a
> > single
> > >>>
> > >>> AE?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> More comments
below.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Am 03.11.2015
um 16:54 schrieb Azad Dehghan:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Who
else plans to provide patches for it? Just to
> > >>>>>>>>>>>>>>>>>>> avoid
> > >>>
> > >>> duplicate
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> work
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> and
to coordnate the efforts ...
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I would
like to help with the translating JAPE to
> RUTA.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> You can already
go ahead with the UIMA Ruta
> > >>>>>>>>>>>>>>>>> Workbench if
> > >>>
> > >>> you want, or
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wait until
I set up the project with ruta integration.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> If any questions
arise, just ask :-)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Is
there a development dataset which was utilized
> > >>>>>>>>>>>>>>>>>>> for
> > the
> > >>>
> > >>> initial
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> development,
and if yes, is it possible to
> > >>>>>>>>>>>>>>>>>>> contribute
it
> > >>>
> > >>> too?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> The data
set is unfortunately not publicly
> > >>>>>>>>>>>>>>>>>> available;
> > i2b2
> > >>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3
> > >>>>>>>>>>>>>>>>>> A_
> > >>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4g
> > >>>>>>>>>>>>>>>>>> oW
> > >>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNn
> > >>>>>>>>>>>>>>>>>> J9
> > >>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP
> > >>>>>>>>>>>>>>>>>> &m
> > >>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEe
> > >>>>>>>>>>>>>>>>>> OR yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e=
>
> > >>>>>>>>>>>>>>>>>> typically
> > >>>
> > >>> releases the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> sets 12
months after a given challenge; this is
> > >>>>>>>>>>>>>>>>>> done on
> > an
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> individual basis
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> and involve
a Data Use Agreement.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> However,
I will be able to conduct and coordinate
> > >>>>>>>>>>>>>>>>>> the
> > >>>
> > >>> validation.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Ok, I'll investigate
if we have already access to
> > >>>>>>>>>>>>>>>>> the
> > >>>
> > >>> dataset here.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> My
first step would be:
> > >>>>>>>>>>>>>>>>>>> - set
up a maven project
> > >>>>>>>>>>>>>>>>>>> - set
up a development pipeline in a test (with
> > >>>>>>>>>>>>>>>>>>> cTAKES
> > >>>
> > >>> components
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> replacing
the previous ANNIE preprocessing)
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> But
one item that we need to review is the 3rd
> > >>>>>>>>>>>>>>>>>>> party
> > libs
> > >>>
> > >>> jars that
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> were
included to ensure compatibility.  I’ll be
> > >>>>>>>>>>>>>>>>>>> sure
to
> > >>>
> > >>> take a look
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> at
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> that
over the next few weeks.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> —Pei
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> @Pei -
once ANNIE components are replaced there is
> > >>>>>>>>>>>>>>>>>> should
> > >>>
> > >>> not be a
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> need to
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> worry about
the 3rd party libs.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Also, just
a thought: we may want to create an
> > independent
> > >>>
> > >>> component
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> the Two
Pass recognition (TwoPass.java) as this
> > >>>>>>>>>>>>>>>>>> method
> > >>>
> > >>> have shown
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> useful
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> for general
NER on longitudinal data and surely
> > >>>>>>>>>>>>>>>>>> useful
> > >>>
> > >>> independent
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> of the
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> deid component.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>> Azad
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message