ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Girivaraprasad Nambari <girinamb...@gmail.com>
Subject Re: Next cTAKES release (3.1)?
Date Fri, 28 Jun 2013 02:58:09 GMT
Hi Vijay and Andy,

Thanks for sharing those examples.

"Trouble is, privacy requires that these examples be made up by hand"

Agree with this statement and this is very valid concern.

In "getting started examples", I think we should just have couple of
entries (5-10 small entries), not more than that (with explicit statement
like "ONLY EXAMPLE", NOT GOOD FOR REAL USAGE). I understand handcrafting
these may not be easy because we are not medical domain experts, but I feel
worth time, because it brings in more user community.

Thank you,
Giri





On Thu, Jun 27, 2013 at 10:25 PM, Andy McMurry <mcmurry.andy@gmail.com>wrote:

> GREAT !
>
> The i2b2 data though isn't publicly distributable, you still need to
> request access to it since it is "semi private"
>
>
> On Jun 27, 2013, at 9:52 PM, vijay garla <vngarla@gmail.com> wrote:
>
> > We released code on using cTAKES to annotate clinical text and SVMs that
> > use the annotations to classify clinical text from the CMC 2007 and I2B2
> > 2008 challenges:
> >
> > We did the cmd 2007 with cTAKES 2.5:
> >
> https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Reproducing_results_on_CMC_2007_challenge
> <https://code.google.com/p/ytex/downloads/list>
> >
> >
> > And the i2b2 2008 with the version of cTAKES distributed with the first
> > version of ARC:
> > https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008
> >
> > These are both publicly available datasets, and represent real-world
> > problems (in general I believe when publishing a paper the code should be
> > reproducible and made publicly available, but that's a different issue).
> >
> > When we get around to upgrading YTEX to cTAKES 3.1, we would like to
> > upgrade these samples as well.
> >
> > Best,
> >
> > VJ
> >
> >
> >
> > On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry <mcmurry.andy@gmail.com
> >wrote:
> >
> >> +1 suggestion for documenting many examples of "getting started" NLP
> >> datasets.
> >>
> >> I have at least one we can use that was created by our lead Pathologist
> >>
> >>
> https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cases/train/traincase.xml
> >>
> >> We should provide at least one sample for each domain.
> >> Trouble is, privacy requires that these examples be made up by hand and
> >> not copy-pasted from EMR systems.
> >>
> >> --Andy
> >>
> >> On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari <
> girinambari@gmail.com>
> >> wrote:
> >>
> >>> +1 for this observation Andy!
> >>>
> >>> Lowering time will motive users in writing blogs about features, how
> to,
> >>> etc., which reduces core team work load on documentation.
> >>>
> >>> I have been trying to write a small "how to write standalone client for
> >>> ctakes" with my experience (I saw at least 4 users posted similar
> >> question
> >>> in last 2 months), but not getting enough time because ctakes depends
> on
> >>> lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), most
> of
> >>> my spare time is being spent on juggling between these frameworks,
> >> posting
> >>> and browsing those forums, relating observations to ctakes code. I
> think
> >> we
> >>> need to have some high level documentation about these (with links to
> >>> corresponding forums).
> >>>
> >>> Above case is for developers (I think this will be more user base as
> >> ctakes
> >>> progress), for users I think documentation is lot better though some
> >>> improvements need to be done.
> >>>
> >>> As a developer I felt tough with lack of sample training data (I am
> still
> >>> struggling in this area even though I browsed all relevant code),
> though
> >>> training class are there. I understood that there are licensing issues
> >> with
> >>> REAL data, but at least some hand made example sentences, which may not
> >> be
> >>> real but helps developers in understanding the type/structure of input
> >>> TRAINING classes expecting. This way people who browse the code can
> >> reverse
> >>> engineer and develop their own models. Sorry if you guys feel this as
> >>> novice issue, but I feel most of the developers will be novice when
> they
> >>> adopt a system and Machine Learning/NLP is ocean. Some documentation in
> >>> this area will same lot of time for us.
> >>>
> >>> I wish there will be some activity in this area from ctakes core team.
> >>>
> >>> Thank you,
> >>> Giri
> >>>
> >>>
> >>>
> >>> On Thu, Jun 27, 2013 at 5:11 PM, Andy McMurry <mcmurry.andy@gmail.com
> >>> wrote:
> >>>
> >>>> ctakes is at a point where we have a LOT of features but it is still
> >> hard
> >>>> to get started.
> >>>>
> >>>> Judging from the mailing lists a lot of how cTakes works is not
> obvious
> >>>> and requires hand holding.
> >>>> This is very typical in early FOSS projects.
> >>>>
> >>>> Lowering the time to get invested in ctakes gets more users AND better
> >> bug
> >>>> reports, FAQ, etc.
> >>>>
> >>>> thoughts?
> >>>> --Andy
> >>>>
> >>>>
> >>>> On Apr 11, 2013, at 8:55 PM, "Chen, Pei" <
> >> Pei.Chen@childrens.harvard.edu>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>> I just wanted to gauge the interest of creating the next release
of
> >>>> cTAKES (3.1) which is currently marked for May in Jira-
> >>>>>
> >>>>> There have already been 22/53 issues [1] marked as fixed or closed.
> >>>> Plenty of bug fixes and new components including:
> >>>>> - New CEM Instance Template population
> >>>>> - New Dependency Parser/Semantic Role Labeler
> >>>>> - New optional Clear POSTagger
> >>>>> - New regression testing component
> >>>>>
> >>>>> Should we wait for the Temporal component?
> >>>>>
> >>>>> [1]
> >>>>
> >>
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%223.1%22%20AND%20project%20%3D%20CTAKES
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message