ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Girivaraprasad Nambari <girinamb...@gmail.com>
Subject Re: Next cTAKES release (3.1)?
Date Thu, 04 Jul 2013 00:19:17 GMT
I think we are near to solve sample data issue,I could help (with help of
Jhon and other team members, when terminology clarification required)
annotating text if some one can provide template (or) some sort of notes on
how to do.

I think this leaves core team concentrating on fine tuning documentation.

Thank you,
Giri



On Jul 3, 2013 7:59 PM, "John Green" <john.travis.green@gmail.com> wrote:

> I see. Its a pretty random collection of formats.
>
> Sent from my iPhone
>
> On Jul 3, 2013, at 18:25, andy mcmurry <mcmurry.andy@gmail.com> wrote:
>
> > Mtsamples has lots of free public examples already but we aren't using
> them
> > yet.  This is probably because mtsamples don't have the annotations we
> need
> > to use them as training examples.
> > On Jul 3, 2013 2:46 PM, "Hephaestus Studio" <hephaestus.studio@gmail.com
> >
> > wrote:
> >
> >> @Andy - Not a doctor yet, but soon! Thanks for the promotion though, one
> >> more year!
> >>
> >> - Apropos meds or clinical type questions: any developer on here can
> feel
> >> free to shoot me a quick question via the list anytime, Id be happy to
> >> confirm that a drug or anything else makes since given a particular
> >> clinical/note context.
> >>
> >> - "I wonder if there is someway in which you could guide us in making
> >> better use of the medical knowledge sources (ontologies) that are
> >> available." - I'd be happy to brainstorm about using existing resources
> to
> >> help in decision making. We use these all the time in the clinic.
> >>
> >> @ Tim+Andy+Chen - I haven't had a chance to really start chewing into
> the
> >> code, though I hope to over the next year; so, what kind of examples
> would
> >> be most helpful?
> >>    - Any particular disease processes?
> >>    - Are you all familiar with the ubiquitous SOAP style presentation
> >> that doctors use to write free notes? The few examples I clicked
> through in
> >> the repository that Chen pointed me too are very sparse. Would we want
> >> gradations? E.g., a scale for "well done" notes to "very quick
> >> I-dont-care-because-I'm-in-a-rush" notes?
> >>
> >> @ Chen - Thank you for the kind words. It's nice to be welcomed by a
> >> community in which you hope to integrate. And thank you for pointing me
> to
> >> the directory with the current sample notes. This was very helpful in
> >> determining where those are at in there development. I know that each of
> >> your hospitals have a wealth of HIPAA-closed notes, but I'll see what I
> can
> >> do to make some "stereotypical" open-notes for common disease
> >> presentations. Again: maybe a scale, not necessarily just on brevity but
> >> some other metric, whose continuum represented various permutations of
> >> degrees of something, maybe of difficulty in processing? Apropos code,
> >> Chen: I will help where I can but where I want to be is elbow deep in
> the
> >> code :)
> >>
> >> Finally: I haven't had a chance to look into some of the links from
> >> earlier in this thread regarding open access repositories of free text
> >> clinical notes: what do you all feel the quality of these resources are?
> >> Abundant but low quality? Paucity but those that are there are high
> quality?
> >>
> >> Bottom line: no problem either answering contextual questions (can afib
> be
> >> associated with a lower gi bleed??) and no problem writing some notes,
> only
> >> question would be, before I put in any time: what disease/specialty
> domain?
> >> and would we want some system that put them on a continuum of some
> >> variable, say, brevity or "readability"?
> >>
> >> Just thinking before leaping,
> >>
> >> Thanks,
> >> JG
> >>
> >> Sent from my iPhone
> >>
> >> On Jul 2, 2013, at 21:23, "Chen, Pei" <Pei.Chen@childrens.harvard.edu>
> >> wrote:
> >>
> >>> Hi John,
> >>> Welcome!  There are actually many ways to contribute and it's not
> >> limited to just code.  It's always great to hear new ideas and
> suggestions
> >> on how to improve the software.  Therefore even, things like user
> feedback,
> >> documentation, new use cases, essentially anything that will make things
> >> better would be awesome!
> >>>
> >>> To get started, I would suggest subscribing to the email lists.  If you
> >> would like to contribute anything, just create an Jira account (anyone
> >> should be able to do this), and add/review Jira items (add attachments
> if
> >> you like) and we can even help integrate it.
> >>>
> >>> We normally use Jira to keep track of issues:
> >>> [1] https://issues.apache.org/jira/browse/ctakes
> >>>
> >>> Current collection of sample test notes that have been collected over
> >> the years:
> >>
> https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-regression-test/testdata/input/plaintext/
> >>>
> >>> ________________________________________
> >>> From: Tim Miller [timothy.miller@childrens.harvard.edu]
> >>> Sent: Tuesday, July 02, 2013 6:31 PM
> >>> To: dev@ctakes.apache.org
> >>> Subject: Re: Next cTAKES release (3.1)?
> >>>
> >>> Agreed that you could definitely help out, and that would be a great
> way
> >>> to do so. We don't really have "examples" right now, more like just
> >>> short test sentences for showing simple results and verifying that
> >>> nothing has been broken by changes. I think regular length fake but
> >>> realistic notes would be very useful.
> >>> Tim
> >>>
> >>> On 07/02/2013 05:19 PM, John Green wrote:
> >>>> Hi all,
> >>>>
> >>>> Ive been following this mail list for a couple of months. Im a third
> >> year medical student rounding the bend toward my MD. I used to be a
> >> computer programmer, however, and continue my own projects. Im very
> >> interested in contributing eventually to cTakes development. In the
> >> meantime, given the current talk of examples, if any domain specific
> >> examples needed generated I am domain knowledgable enough that I could
> >> pound out a few free text notes made to order.
> >>>>
> >>>> Let me know, you all may already have docs on hand willing todo this,
> >> but if not...
> >>>>
> >>>> John Green
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>> On Jun 28, 2013, at 8:59, "Chen, Pei" <Pei.Chen@childrens.harvard.edu
> >
> >> wrote:
> >>>>
> >>>>> I completely agree with making cTAKES easier use.  I think it is
> >> exciting to hear the different use cases here and understanding where
> some
> >> of the areas that need improvements are (which we haven't thought about
> >> earlier).
> >>>>> I think Tim's suggestions and the 3 concrete actionable items makes
a
> >> lot of sense.  Hopefully it should attract new users, adopters, and
> perhaps
> >> more committers.
> >>>>>
> >>>>>> i) Make the typesystem forefront in documentation -- generate
> >> javadocs and
> >>>>>> have as a link on the ctakes frontpage/sidebar
> >>>>>> ii) Similar to the way that we are aiming to have tests in every
> >> module, also
> >>>>>> have clearly labeled examples in every module that set up a
> pipeline,
> >> run on
> >>>>>> sample notes (could be the same sample notes from the tests),
and do
> >>>>>> something with the results.
> >>>>>> iii) Follow Giri's recommendation to have example training data
for
> >> people
> >>>>>> who want to take the next step and train their own models
> >>>>> I think Java developers are accustomed to including a library as
a
> >> dependency/jar, have an API to pass input, and get the results via
> pojos;
> >> So the examples could initially shield the complexity of wiring a
> pipeline
> >> together etc.
> >>>>> If we can improve the API's and how it gets integrated with other
> >> apps, we can add any GUI/CLI tools on top of this afterwards.
> >>>>>
> >>>>> --Pei
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> >>>>>> Sent: Friday, June 28, 2013 8:00 AM
> >>>>>> To: dev@ctakes.apache.org
> >>>>>> Subject: Re: Next cTAKES release (3.1)?
> >>>>>>
> >>>>>> Very interesting discussion. I think Giri is right about giving
> >> example training
> >>>>>> data in the format that our training code can read. While our
> >> ultimate goal
> >>>>>> would be to build and release models that are completely domain-
> >>>>>> independent, in the real world it is almost always better to
use
> some
> >>>>>> domain-specific data and we should think more about how to
> facilitate
> >> that.
> >>>>>>
> >>>>>> As for making it easier to get started, it is not totally clear
to
> me
> >> what this
> >>>>>> means/how to do it so it might be useful to get specific about
what
> >> this
> >>>>>> means. I think our biggest hurdle is
> >>>>>>
> >>>>>> 1) Prerequisite of understanding UIMA/UIMAFit
> >>>>>>
> >>>>>> Since UIMAFit is officially becoming part of UIMA that will
be
> >> easier, and
> >>>>>> hopefully people will just learn the easier (in my opinion)
UIMAFit
> >> way than
> >>>>>> the standard UIMA way of doing things. Is there something we
can be
> >> doing
> >>>>>> to make understanding UIMA easier? Or do we just need to say
upfront
> >> that
> >>>>>> this is a prerequisite and hope that people don't give up due
to
> this
> >> thing that
> >>>>>> is out of our control?
> >>>>>>
> >>>>>> Another hurdle is:
> >>>>>>
> >>>>>> 2) cTAKES is a multi-purpose developer-aimed tool
> >>>>>>
> >>>>>> So it's not just a matter of hiding complexity -- at some point
> >> people have to
> >>>>>> understand their problem, understand cTAKES' capabilities, and
start
> >> coding.
> >>>>>> Pei's GUI will help for some common use cases but will not remove
> the
> >>>>>> requirement that someone at the organization knows cTAKES.
> >>>>>> I think one part of this problem is the fact that the typesystem
is
> >> not well
> >>>>>> documented. A developer needs to know what the output is (objects
> from
> >>>>>> the typesystem), how to get them (which modules/pipelines),
and what
> >>>>>> information is in them. So maybe on this end my recommendation
would
> >> be:
> >>>>>> i) Make the typesystem forefront in documentation -- generate
> >> javadocs and
> >>>>>> have as a link on the ctakes frontpage/sidebar
> >>>>>> ii) Similar to the way that we are aiming to have tests in every
> >> module, also
> >>>>>> have clearly labeled examples in every module that set up a
> pipeline,
> >> run on
> >>>>>> sample notes (could be the same sample notes from the tests),
and do
> >>>>>> something with the results.
> >>>>>> iii) Follow Giri's recommendation to have example training data
for
> >> people
> >>>>>> who want to take the next step and train their own models
> >>>>>>
> >>>>>> This is quite a bit of developer overhead, so it's worth asking
> >> whether you
> >>>>>> agree with my "diagnosis" and "treatment" or whether you think
there
> >> are
> >>>>>> different problems/solutions that should be higher priority.
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>> On 06/27/2013 10:59 PM, Girivaraprasad Nambari wrote:
> >>>>>>> Hi Vijay and Andy,
> >>>>>>>
> >>>>>>> Thanks for sharing those examples.
> >>>>>>>
> >>>>>>> "Trouble is, privacy requires that these examples be made
up by
> hand"
> >>>>>>>
> >>>>>>> Agree with this statement and this is very valid concern.
> >>>>>>>
> >>>>>>> In "getting started examples", I think we should just have
couple
> of
> >>>>>>> entries (5-10 small entries), not more than that (with explicit
> >>>>>>> statement like "ONLY EXAMPLE", NOT GOOD FOR REAL USAGE).
I
> >>>>>> understand
> >>>>>>> handcrafting these may not be easy because we are not medical
> domain
> >>>>>>> experts, but I feel worth time, because it brings in more
user
> >> community.
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>> Giri
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Jun 27, 2013 at 10:25 PM, Andy McMurry
> >>>>>> <mcmurry.andy@gmail.com>wrote:
> >>>>>>>> GREAT !
> >>>>>>>>
> >>>>>>>> The i2b2 data though isn't publicly distributable, you
still need
> to
> >>>>>>>> request access to it since it is "semi private"
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Jun 27, 2013, at 9:52 PM, vijay garla <vngarla@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>> We released code on using cTAKES to annotate clinical
text and
> SVMs
> >>>>>>>>> that use the annotations to classify clinical text
from the CMC
> >> 2007
> >>>>>>>>> and I2B2
> >>>>>>>>> 2008 challenges:
> >>>>>>>>>
> >>>>>>>>> We did the cmd 2007 with cTAKES 2.5:
> >>>>>>
> https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Repr
> >>>>>> o
> >>>>>>>> ducing_results_on_CMC_2007_challenge
> >>>>>>>> <https://code.google.com/p/ytex/downloads/list>
> >>>>>>>>> And the i2b2 2008 with the version of cTAKES distributed
with the
> >>>>>>>>> first version of ARC:
> >>>>>>>>> https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008
> >>>>>>>>>
> >>>>>>>>> These are both publicly available datasets, and
represent
> >> real-world
> >>>>>>>>> problems (in general I believe when publishing a
paper the code
> >>>>>>>>> should be reproducible and made publicly available,
but that's a
> >> different
> >>>>>> issue).
> >>>>>>>>> When we get around to upgrading YTEX to cTAKES 3.1,
we would like
> >> to
> >>>>>>>>> upgrade these samples as well.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> VJ
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry
> >>>>>>>>> <mcmurry.andy@gmail.com
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> +1 suggestion for documenting many examples
of "getting started"
> >>>>>>>>>> +NLP
> >>>>>>>>>> datasets.
> >>>>>>>>>>
> >>>>>>>>>> I have at least one we can use that was created
by our lead
> >>>>>>>>>> Pathologist
> >>>>>>
> https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cas
> >>>>>>>> es/train/traincase.xml
> >>>>>>>>>> We should provide at least one sample for each
domain.
> >>>>>>>>>> Trouble is, privacy requires that these examples
be made up by
> >> hand
> >>>>>>>>>> and not copy-pasted from EMR systems.
> >>>>>>>>>>
> >>>>>>>>>> --Andy
> >>>>>>>>>>
> >>>>>>>>>> On Jun 27, 2013, at 5:32 PM, Girivaraprasad
Nambari <
> >>>>>>>> girinambari@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> +1 for this observation Andy!
> >>>>>>>>>>>
> >>>>>>>>>>> Lowering time will motive users in writing
blogs about
> features,
> >>>>>>>>>>> how
> >>>>>>>> to,
> >>>>>>>>>>> etc., which reduces core team work load
on documentation.
> >>>>>>>>>>>
> >>>>>>>>>>> I have been trying to write a small "how
to write standalone
> >>>>>>>>>>> client for ctakes" with my experience (I
saw at least 4 users
> >>>>>>>>>>> posted similar
> >>>>>>>>>> question
> >>>>>>>>>>> in last 2 months), but not getting enough
time because ctakes
> >>>>>>>>>>> depends
> >>>>>>>> on
> >>>>>>>>>>> lot of other frameworks (UimaFit, cleartk,
UIMA Framework
> etc.,),
> >>>>>>>>>>> most
> >>>>>>>> of
> >>>>>>>>>>> my spare time is being spent on juggling
between these
> >> frameworks,
> >>>>>>>>>> posting
> >>>>>>>>>>> and browsing those forums, relating observations
to ctakes
> code.
> >> I
> >>>>>>>> think
> >>>>>>>>>> we
> >>>>>>>>>>> need to have some high level documentation
about these (with
> >> links
> >>>>>>>>>>> to corresponding forums).
> >>>>>>>>>>>
> >>>>>>>>>>> Above case is for developers (I think this
will be more user
> base
> >>>>>>>>>>> as
> >>>>>>>>>> ctakes
> >>>>>>>>>>> progress), for users I think documentation
is lot better though
> >>>>>>>>>>> some improvements need to be done.
> >>>>>>>>>>>
> >>>>>>>>>>> As a developer I felt tough with lack of
sample training data
> (I
> >>>>>>>>>>> am
> >>>>>>>> still
> >>>>>>>>>>> struggling in this area even though I browsed
all relevant
> code),
> >>>>>>>> though
> >>>>>>>>>>> training class are there. I understood that
there are licensing
> >>>>>>>>>>> issues
> >>>>>>>>>> with
> >>>>>>>>>>> REAL data, but at least some hand made example
sentences, which
> >>>>>>>>>>> may not
> >>>>>>>>>> be
> >>>>>>>>>>> real but helps developers in understanding
the type/structure
> of
> >>>>>>>>>>> input TRAINING classes expecting. This way
people who browse
> the
> >>>>>>>>>>> code can
> >>>>>>>>>> reverse
> >>>>>>>>>>> engineer and develop their own models. Sorry
if you guys feel
> >> this
> >>>>>>>>>>> as novice issue, but I feel most of the
developers will be
> novice
> >>>>>>>>>>> when
> >>>>>>>> they
> >>>>>>>>>>> adopt a system and Machine Learning/NLP
is ocean. Some
> >>>>>>>>>>> documentation in this area will same lot
of time for us.
> >>>>>>>>>>>
> >>>>>>>>>>> I wish there will be some activity in this
area from ctakes
> core
> >> team.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you,
> >>>>>>>>>>> Giri
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Jun 27, 2013 at 5:11 PM, Andy McMurry
> >>>>>>>>>>> <mcmurry.andy@gmail.com
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> ctakes is at a point where we have a
LOT of features but it is
> >>>>>>>>>>>> still
> >>>>>>>>>> hard
> >>>>>>>>>>>> to get started.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Judging from the mailing lists a lot
of how cTakes works is
> not
> >>>>>>>> obvious
> >>>>>>>>>>>> and requires hand holding.
> >>>>>>>>>>>> This is very typical in early FOSS projects.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Lowering the time to get invested in
ctakes gets more users
> AND
> >>>>>>>>>>>> better
> >>>>>>>>>> bug
> >>>>>>>>>>>> reports, FAQ, etc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> thoughts?
> >>>>>>>>>>>> --Andy
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Apr 11, 2013, at 8:55 PM, "Chen,
Pei" <
> >>>>>>>>>> Pei.Chen@childrens.harvard.edu>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>> I just wanted to gauge the interest
of creating the next
> >> release
> >>>>>>>>>>>>> of
> >>>>>>>>>>>> cTAKES (3.1) which is currently marked
for May in Jira-
> >>>>>>>>>>>>> There have already been 22/53 issues
[1] marked as fixed or
> >> closed.
> >>>>>>>>>>>> Plenty of bug fixes and new components
including:
> >>>>>>>>>>>>> - New CEM Instance Template population
> >>>>>>>>>>>>> - New Dependency Parser/Semantic
Role Labeler
> >>>>>>>>>>>>> - New optional Clear POSTagger
> >>>>>>>>>>>>> - New regression testing component
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Should we wait for the Temporal
component?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%223.1%
> >>>>>>>> 22%20AND%20project%20%3D%20CTAKES
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message