ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azad Dehghan <azad.dehg...@gmail.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Fri, 11 Mar 2016 10:46:04 GMT
My focus is rather on
> helping out in translating the contribution from GATE/JAPE to UIMA/Ruta.
> Thus, I concentrate on the existing functionality for now.
>

Do we have a tracker; which ones remain?


>
> What is the final goal of the cTAKES comunity concerning clinical deid
> components? Will both sandbox projects be merged, what about statistical
> approaches?
>

Does cTAKES provide APIs/integration of any statistical approaches?


>
> @Azad: I am just curious on which data the rules exactly rely. I think
> I'll find the information in the article.
> I assume that the 521 docuemnts have been utilized to develop the rules
> and the 269 documents to evaluate them. Did you correct the rules also
> using the second set? I need to reread to article :-)
>


The entire training dataset was used: 521+269.

Azad


>
> Am 10.03.2016 um 23:22 schrieb andy mcmurry:
> > *** For cross-validation, you can evaluate de-identified notes data from
> > i2b2 challenge** *
> >
> https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/data/models/
> >
> > *Methods for model generation of FeatureSet described here: *
> >
> > *Improved de-identification of physician notes through integrative
> modeling
> > of both public and private medical text*
> >
> http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-112
> >
> > Major objective of that study was to help provide external examples to
> > cross train / retrain other methods.
> >
> > hope this helps,
> > --Andy
> >
> >
> >
> > On Thu, Mar 10, 2016 at 1:27 PM, Savova, Guergana <
> > Guergana.Savova@childrens.harvard.edu> wrote:
> >
> >> You can re-build the models that feed into MIST. I personally would not
> >> use the default model that MIST comes with as it is not trained on
> clinical
> >> data. In our previous work we found that hand-annotating about 200 docs
> for
> >> PHI (representative of the sample you are going to run the models on)
> >> results in building a pretty good model - in the 90's for p, r and f1.
> >> However, even with that high performance, the institution that owns the
> >> data might be still reluctant to share as it might pose a violation of
> >> HIPAA through some potential PHI leaks. In cTAKES our approach has been
> to
> >> de-couple the de-identifcation from the NLP/information extraction. If a
> >> user has the need for de-identified data, they could choose their
> method --
> >> manual or otherwise -- and then process through cTAKES. Our focus is the
> >> NLP/IE space, while de-identification is a blend of that plus policy....
> >>
> >> --Guergana
> >>
> >> -----Original Message-----
> >> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
> >> Sent: Thursday, March 10, 2016 4:19 PM
> >> To: dev@ctakes.apache.org
> >> Subject: RE: Combining Knowledge- and Data-driven Methods for
> >> De-identification of Clinical Narratives
> >>
> >> Thanks Guergana.
> >>
> >>> Yes, the current release of cTAKES has a module for the temporal
> >> expressions which includes dates. The normalizer for the temporal
> >> expressions is Steven Bethard's timenorm code.
> >> Great.
> >>
> >>> However, if you do de-identification of dates/temporal expressions,
> >>> you
> >> run the risk of creating incorrect timelines as many of the relative
> >> temporal expressions (e.g. spring of this year, x-mas time, etc.) are
> >> unlikely to be correctly shifted by any de-identification tool.
> >> Indeed, a reason I have not included the dates component.
> >>
> >>> One de-identification tool is MIST --
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
> >> .
> >> I don't remember them doing well in the community held evaluation in
> 2014.
> >> Hence, cDeid :)
> >>> Guergana Savova, PhD, FACMI
> >>> Associate Professor
> >>> PI Natural Language Processing Lab
> >>> Boston Children's Hospital and Harvard Medical School
> >>> 300 Longwood Avenue
> >>> Mailstop: BCH3092
> >>> Enders 144.1
> >>> Boston, MA 02115
> >>> Tel: (617) 919-2972
> >>> Fax: (617) 730-0817
> >>> Harvard Scholar:
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> >>> u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> >>> ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
> >>> RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
> >>> iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
> >>>
> >>> -----Original Message-----
> >>> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
> >>> Sent: Thursday, March 10, 2016 3:42 PM
> >>> To: dev@ctakes.apache.org
> >>> Subject: Re: Combining Knowledge- and Data-driven Methods for
> >> De-identification of Clinical Narratives
> >>>> This means both training data folders? I have access to the data but
> >>>> not
> >>> to the challenge description.
> >>>
> >>> Yes. Is there any specific information that you are missing?
> >>>>
> >>>>> It would be good to incorporate/refactor (basically, GATE API needs
> >>>>> to be replaced with UIMA API to generate annotation) the two-pass
> >>>>> recognition method for cTAKES - which has a wider application on
> >> longitudinal data.
> >>>>> This method is used on-top of a number NERs.
> >>>>
> >>>> I'll take a look.
> >>>>
> >>>> I do not know how much time I can invest this month. Let's see how
> >>>> many
> >>> phases I can translate.
> >>>> I added the rules for age. Are there jape rules for creating date
> >>> annotations?
> >>> No. I believe cTAKES has existing component(s) to capture dates?
> >>>
> >>>> After all rules are translated, they need some major refactoring.
> >>>> Jape
> >>> and Ruta are quite different in some aspects.
> >>> Ok.
> >>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Please let me know where I can help. I will be available again in
> >> April.
> >>>>> Cheers,
> >>>>> Azad
> >>>>>
> >>>>> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com>
> >> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> sorry, I was quite busy last month.
> >>>>>>
> >>>>>> I added a new patch, which needs to be applied.
> >>>>>>
> >>>>>> No new rules, but it's possible now to evaluate everything against
> >>>>>> the labelled data of the challenge.
> >>>>>>
> >>>>>> @Azad:
> >>>>>> Which documents exactly did you use to develop the rules?
> >>>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> >>> testing-PHI-Gold-fixed?
> >>>>>> Best,
> >>>>>>
> >>>>>> Peter
> >>>>>>
> >>>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> the last patch fixed almost all problems.
> >>>>>>>
> >>>>>>> I added another one that adds the csv file for the unit
test and
> >>> extends
> >>>>>>> svn-ignore.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Peter
> >>>>>>>
> >>>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I added another patch. I missed to manually add one
test file to
> >>> version
> >>>>>>>> control, and there are still duplicate lines.
> >>>>>>>> I hope this patch fixes the remaining problems.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Peter
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> the problems were caused by the svn client in my
Eclipse. Sorry
> >>>>>>>>> for
> >>> the
> >>>>>>>>> trouble, I should have looked more closely at the
ciomplete
> patch.
> >>>>>>>>>
> >>>>>>>>> I attached a new patch created with commandline
tools wich
> >>>>>>>>> looks
> >>>>>> correct
> >>>>>>>>> now.
> >>>>>>>>>
> >>>>>>>>> Pei, can you apply the new patch?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Peter
> >>>>>>>>>
> >>>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
> >>>>>>>>>> Thanks Pei.
> >>>>>>>>>>
> >>>>>>>>>> I fear there was again a problem with the patch.
All new files
> >>>>>>>>>> are missing (and also the svn-ignore settings).
> >>>>>>>>>>
> >>>>>>>>>> Can you take a look?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>>
> >>>>>>>>>> Peter
> >>>>>>>>>>
> >>>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
> >>>>>>>>>>> patch applied.
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Pei
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl
<
> >>>>>> peter.kluegl@averbis.com> wrote:
> >>>>>>>>>>>> Hi Pei,
> >>>>>>>>>>>>
> >>>>>>>>>>>> can you commit the recent patch for
us?
> >>>>>>>>>>>>
> >>>>>>>>>>>> CTAKES-384-20160120.patch
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Peter
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>> Sorry I was swamped recently.
> >>>>>>>>>>>>> But yeah, we can even create an
extended type system to
> >>>>>>>>>>>>> store
> >>>>>> these items temporarily and add them into the main/core type
> >>>>>> system afterwards.
> >>>>>>>>>>>>> There was an existing item to upgrade
UIMA, but agreed- it
> >>>>>>>>>>>>> will
> >>>>>> require much more testing.  If it works, we can upgrade it in
our
> >>> sandbox
> >>>>>> area or create a branch if necessary.
> >>>>>>>>>>>>> —Pei
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM,
Peter Klügl <
> >>>>>> peter.kluegl@averbis.com> wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> a new patch is attached.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> @Pei:
> >>>>>>>>>>>>>> are there suitable annotation
types in the cTAKES type
> >> system?
> >>>>>> Some
> >>>>>>>>>>>>>> project in cTAKES uses something
like OntologyMatch... I
> >>>>>>>>>>>>>> map it
> >>> to
> >>>>>>>>>>>>>> IdentifiedAnnotation right now,
but there are many empty
> >>>>>> features...
> >>>>>>>>>>>>>> @Azad:
> >>>>>>>>>>>>>> I changed the rules a bit, especially
the capitalization
> >>>>>>>>>>>>>> like I
> >>>>>> use it
> >>>>>>>>>>>>>> in ruta normally. The wordlist
are compiled to a trie by
> >>>>>>>>>>>>>> the
> >>> maven
> >>>>>>>>>>>>>> plugin. I also added the two
regexes for url and email. I
> >>>>>> extended the
> >>>>>>>>>>>>>> regex for the url. I also changed
the evaluation order of
> >>>>>>>>>>>>>> some
> >>>>>> rules
> >>>>>>>>>>>>>> (with @). Feel free to add simple
examples to examples.csv
> >>>>>>>>>>>>>> for
> >>>>>> the unit
> >>>>>>>>>>>>>> tests.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Let me know if you need more
information about the changes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Do you wanna have help with
the other rule sets? Or should
> >>>>>>>>>>>>>> we
> >>>>>> split them up?
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb
Peter Klügl:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> great. I will integrate
them in the project and in the
> >>>>>>>>>>>>>>> next
> >>>>>> patch.
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb
Azad Dehghan:
> >>>>>>>>>>>>>>>> Three NERs translated
and uploaded.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> PS. I will validate
all NERs once we have them all
> >> completed.
> >>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 24 November 2015
at 10:37, Azad Dehghan <
> >>>>>> azad.dehghan@gmail.com> wrote:
> >>>>>>>>>>>>>>>>> This is on my todo
list for Dec. as well. If there are
> >>>>>>>>>>>>>>>>> any
> >>>>>> more volunteers
> >>>>>>>>>>>>>>>>> for translating
JAPE to RUTA, please get in touch.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 24 Nov 2015 09:55,
"Peter Klügl"
> >>>>>>>>>>>>>>>>> <peter.kluegl@averbis.com
> >>>>>> wrote:
> >>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I just wanted
to mention that I haven't forgot about it.
> >>>>>> Unfortunately,
> >>>>>>>>>>>>>>>>>> there is just
no spare time right now. I hope I will
> >>>>>>>>>>>>>>>>>> be able
> >>>>>> to provide
> >>>>>>>>>>>>>>>>>> the patches
in December.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Am 06.11.2015
um 16:40 schrieb Pei Chen:
> >>>>>>>>>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>>>>>>>> I think
the ctakes-examples is probably a good
> >>>>>>>>>>>>>>>>>>> starting
> >>>>>> point at least
> >>>>>>>>>>>>>>>>>>> in terms
of maven modules, etc.  I think it would be
> >>>>>>>>>>>>>>>>>>> good
> >>> if
> >>>>>> we use
> >>>>>>>>>>>>>>>>>>> uimaFIT
style as primary approach to wiring
> >>>>>>>>>>>>>>>>>>> components
> >>>>>> together and
> >>>>>>>>>>>>>>>>>>> generate
desc's as secondary...
> >>>>>>>>>>>>>>>>>>> I think
the actual components that would be required
> >>>>>>>>>>>>>>>>>>> is
> >>>>>> probably best
> >>>>>>>>>>>>>>>>>>> left up
to what is actually required for best
> >>>>>>>>>>>>>>>>>>> performing
> >>>>>> c-deid.  The
> >>>>>>>>>>>>>>>>>>> output would
be interesting, I'm not sure if we
> >>>>>>>>>>>>>>>>>>> should
> >>> treat
> >>>>>> this as
> >>>>>>>>>>>>>>>>>>> an independent
preprocessing component or part of a
> >>> pipeline
> >>>>>> (in which
> >>>>>>>>>>>>>>>>>>> case, we
may need to propose a change to the type
> >>>>>>>>>>>>>>>>>>> system or
> >>>>>> perhaps an
> >>>>>>>>>>>>>>>>>>> alternative
JCas view.  You can probably open up that
> >>>>>> discussion to
> >>>>>>>>>>>>>>>>>>> the dev
group as you see fit.)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> My 2 cents...
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri,
Nov 6, 2015 at 3:38 AM, Peter Klügl <
> >>>>>> peter.kluegl@averbis.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Is there
a cTAKES project that may serve as an
> >>>>>>>>>>>>>>>>>>>> example
on
> >>>>>> how the
> >>>>>>>>>>>>>>>>> cTAKES
> >>>>>>>>>>>>>>>>>>>> community
develops or how a project should look like?
> >>>>>>>>>>>>>>>>>>>> I learned
that different people set up UIMA project
> >>>>>>>>>>>>>>>>>>>> in a
> >>>>>> quite
> >>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>> manner
and I do not what to get inspired by "some
> >>>>>>>>>>>>>>>>>>>> sort
of
> >>>>>> out-dated"
> >>>>>>>>>>>>>>>>>>>> approach
in the cTAKES repo.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Are
there restriction or preferences about the
> >>> preprocessing
> >>>>>>>>>>>>>>>>> components
> >>>>>>>>>>>>>>>>>>>> that
should be used and the kind of "output" of the
> >>> project.
> >>>>>>>>>>>>>>>>>>>> Components:
On which components may the componetns
> >> rely:
> >>>>>> tokenizer,
> >>>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>>>> parser,
... dict lookup?
> >>>>>>>>>>>>>>>>>>>> "output":
Should the project provide a pipeline or a
> >>> single
> >>>>>> AE?
> >>>>>>>>>>>>>>>>>>>> More
comments below.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Am 03.11.2015
um 16:54 schrieb Azad Dehghan:
> >>>>>>>>>>>>>>>>>>>>>>
Who else plans to provide patches for it? Just to
> >>>>>>>>>>>>>>>>>>>>>>
avoid
> >>>>>> duplicate
> >>>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>>>
and to coordnate the efforts ...
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
I would like to help with the translating JAPE to
> >> RUTA.
> >>>>>>>>>>>>>>>>>>>> You
can already go ahead with the UIMA Ruta
> >>>>>>>>>>>>>>>>>>>> Workbench
if
> >>>>>> you want, or
> >>>>>>>>>>>>>>>>>>>> wait
until I set up the project with ruta integration.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> If any
questions arise, just ask :-)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
Is there a development dataset which was utilized
> >>>>>>>>>>>>>>>>>>>>>>
for
> >>> the
> >>>>>> initial
> >>>>>>>>>>>>>>>>>>>>>>
development, and if yes, is it possible to
> >>>>>>>>>>>>>>>>>>>>>>
contribute it
> >>>>>> too?
> >>>>>>>>>>>>>>>>>>>>>
The data set is unfortunately not publicly
> >>>>>>>>>>>>>>>>>>>>>
available;
> >>> i2b2
> >>>>>>>>>>>>>>>>>>>>>
<https://urldefense.proofpoint.com/v2/url?u=https-3
> >>>>>>>>>>>>>>>>>>>>>
A_
> >>>>>>>>>>>>>>>>>>>>>
_www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4g
> >>>>>>>>>>>>>>>>>>>>>
oW
> >>>>>>>>>>>>>>>>>>>>>
BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNn
> >>>>>>>>>>>>>>>>>>>>>
J9
> >>>>>>>>>>>>>>>>>>>>>
mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP
> >>>>>>>>>>>>>>>>>>>>>
&m
> >>>>>>>>>>>>>>>>>>>>>
=1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEe
> >>>>>>>>>>>>>>>>>>>>>
OR yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e= >
> >>>>>>>>>>>>>>>>>>>>>
typically
> >>>>>> releases the
> >>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>
sets 12 months after a given challenge; this is
> >>>>>>>>>>>>>>>>>>>>>
done on
> >>> an
> >>>>>>>>>>>>>>>>> individual basis
> >>>>>>>>>>>>>>>>>>>>>
and involve a Data Use Agreement.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
However, I will be able to conduct and coordinate
> >>>>>>>>>>>>>>>>>>>>>
the
> >>>>>> validation.
> >>>>>>>>>>>>>>>>>>>> Ok,
I'll investigate if we have already access to
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>> dataset here.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
My first step would be:
> >>>>>>>>>>>>>>>>>>>>>>
- set up a maven project
> >>>>>>>>>>>>>>>>>>>>>>
- set up a development pipeline in a test (with
> >>>>>>>>>>>>>>>>>>>>>>
cTAKES
> >>>>>> components
> >>>>>>>>>>>>>>>>>>>>>>
replacing the previous ANNIE preprocessing)
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
But one item that we need to review is the 3rd
> >>>>>>>>>>>>>>>>>>>>>>
party
> >>> libs
> >>>>>> jars that
> >>>>>>>>>>>>>>>>>>>>>>
were included to ensure compatibility.  I’ll be
> >>>>>>>>>>>>>>>>>>>>>>
sure to
> >>>>>> take a look
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>
that over the next few weeks.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
—Pei
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
@Pei - once ANNIE components are replaced there is
> >>>>>>>>>>>>>>>>>>>>>
should
> >>>>>> not be a
> >>>>>>>>>>>>>>>>> need to
> >>>>>>>>>>>>>>>>>>>>>
worry about the 3rd party libs.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
Also, just a thought: we may want to create an
> >>> independent
> >>>>>> component
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>
the Two Pass recognition (TwoPass.java) as this
> >>>>>>>>>>>>>>>>>>>>>
method
> >>>>>> have shown
> >>>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>>>>>
for general NER on longitudinal data and surely
> >>>>>>>>>>>>>>>>>>>>>
useful
> >>>>>> independent
> >>>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>
deid component.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
Cheers,
> >>>>>>>>>>>>>>>>>>>>>
Azad
> >>>>>>>>>>>>>>>>>>>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message