ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy McMurry <mcmurry.a...@gmail.com>
Subject Re: Next cTAKES release (3.1)?
Date Fri, 28 Jun 2013 00:32:06 GMT
+1 suggestion for documenting many examples of "getting started" NLP datasets. 

I have at least one we can use that was created by our lead Pathologist 
https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cases/train/traincase.xml

We should provide at least one sample for each domain. 
Trouble is, privacy requires that these examples be made up by hand and not copy-pasted from
EMR systems. 

--Andy 

On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari <girinambari@gmail.com> wrote:

> +1 for this observation Andy!
> 
> Lowering time will motive users in writing blogs about features, how to,
> etc., which reduces core team work load on documentation.
> 
> I have been trying to write a small "how to write standalone client for
> ctakes" with my experience (I saw at least 4 users posted similar question
> in last 2 months), but not getting enough time because ctakes depends on
> lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), most of
> my spare time is being spent on juggling between these frameworks, posting
> and browsing those forums, relating observations to ctakes code. I think we
> need to have some high level documentation about these (with links to
> corresponding forums).
> 
> Above case is for developers (I think this will be more user base as ctakes
> progress), for users I think documentation is lot better though some
> improvements need to be done.
> 
> As a developer I felt tough with lack of sample training data (I am still
> struggling in this area even though I browsed all relevant code), though
> training class are there. I understood that there are licensing issues with
> REAL data, but at least some hand made example sentences, which may not be
> real but helps developers in understanding the type/structure of input
> TRAINING classes expecting. This way people who browse the code can reverse
> engineer and develop their own models. Sorry if you guys feel this as
> novice issue, but I feel most of the developers will be novice when they
> adopt a system and Machine Learning/NLP is ocean. Some documentation in
> this area will same lot of time for us.
> 
> I wish there will be some activity in this area from ctakes core team.
> 
> Thank you,
> Giri
> 
> 
> 
> On Thu, Jun 27, 2013 at 5:11 PM, Andy McMurry <mcmurry.andy@gmail.com>wrote:
> 
>> ctakes is at a point where we have a LOT of features but it is still hard
>> to get started.
>> 
>> Judging from the mailing lists a lot of how cTakes works is not obvious
>> and requires hand holding.
>> This is very typical in early FOSS projects.
>> 
>> Lowering the time to get invested in ctakes gets more users AND better bug
>> reports, FAQ, etc.
>> 
>> thoughts?
>> --Andy
>> 
>> 
>> On Apr 11, 2013, at 8:55 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu>
>> wrote:
>> 
>>> Hi,
>>> I just wanted to gauge the interest of creating the next release of
>> cTAKES (3.1) which is currently marked for May in Jira-
>>> 
>>> There have already been 22/53 issues [1] marked as fixed or closed.
>> Plenty of bug fixes and new components including:
>>> - New CEM Instance Template population
>>> - New Dependency Parser/Semantic Role Labeler
>>> - New optional Clear POSTagger
>>> - New regression testing component
>>> 
>>> Should we wait for the Temporal component?
>>> 
>>> [1]
>> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%223.1%22%20AND%20project%20%3D%20CTAKES
>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message