ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy McMurry <mcmurry.a...@gmail.com>
Subject Re: Next cTAKES release (3.1)?
Date Fri, 28 Jun 2013 02:25:54 GMT
GREAT ! 

The i2b2 data though isn't publicly distributable, you still need to request access to it
since it is "semi private" 


On Jun 27, 2013, at 9:52 PM, vijay garla <vngarla@gmail.com> wrote:

> We released code on using cTAKES to annotate clinical text and SVMs that
> use the annotations to classify clinical text from the CMC 2007 and I2B2
> 2008 challenges:
> 
> We did the cmd 2007 with cTAKES 2.5:
> https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Reproducing_results_on_CMC_2007_challenge<https://code.google.com/p/ytex/downloads/list>
> 
> 
> And the i2b2 2008 with the version of cTAKES distributed with the first
> version of ARC:
> https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008
> 
> These are both publicly available datasets, and represent real-world
> problems (in general I believe when publishing a paper the code should be
> reproducible and made publicly available, but that's a different issue).
> 
> When we get around to upgrading YTEX to cTAKES 3.1, we would like to
> upgrade these samples as well.
> 
> Best,
> 
> VJ
> 
> 
> 
> On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry <mcmurry.andy@gmail.com>wrote:
> 
>> +1 suggestion for documenting many examples of "getting started" NLP
>> datasets.
>> 
>> I have at least one we can use that was created by our lead Pathologist
>> 
>> https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cases/train/traincase.xml
>> 
>> We should provide at least one sample for each domain.
>> Trouble is, privacy requires that these examples be made up by hand and
>> not copy-pasted from EMR systems.
>> 
>> --Andy
>> 
>> On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari <girinambari@gmail.com>
>> wrote:
>> 
>>> +1 for this observation Andy!
>>> 
>>> Lowering time will motive users in writing blogs about features, how to,
>>> etc., which reduces core team work load on documentation.
>>> 
>>> I have been trying to write a small "how to write standalone client for
>>> ctakes" with my experience (I saw at least 4 users posted similar
>> question
>>> in last 2 months), but not getting enough time because ctakes depends on
>>> lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), most of
>>> my spare time is being spent on juggling between these frameworks,
>> posting
>>> and browsing those forums, relating observations to ctakes code. I think
>> we
>>> need to have some high level documentation about these (with links to
>>> corresponding forums).
>>> 
>>> Above case is for developers (I think this will be more user base as
>> ctakes
>>> progress), for users I think documentation is lot better though some
>>> improvements need to be done.
>>> 
>>> As a developer I felt tough with lack of sample training data (I am still
>>> struggling in this area even though I browsed all relevant code), though
>>> training class are there. I understood that there are licensing issues
>> with
>>> REAL data, but at least some hand made example sentences, which may not
>> be
>>> real but helps developers in understanding the type/structure of input
>>> TRAINING classes expecting. This way people who browse the code can
>> reverse
>>> engineer and develop their own models. Sorry if you guys feel this as
>>> novice issue, but I feel most of the developers will be novice when they
>>> adopt a system and Machine Learning/NLP is ocean. Some documentation in
>>> this area will same lot of time for us.
>>> 
>>> I wish there will be some activity in this area from ctakes core team.
>>> 
>>> Thank you,
>>> Giri
>>> 
>>> 
>>> 
>>> On Thu, Jun 27, 2013 at 5:11 PM, Andy McMurry <mcmurry.andy@gmail.com
>>> wrote:
>>> 
>>>> ctakes is at a point where we have a LOT of features but it is still
>> hard
>>>> to get started.
>>>> 
>>>> Judging from the mailing lists a lot of how cTakes works is not obvious
>>>> and requires hand holding.
>>>> This is very typical in early FOSS projects.
>>>> 
>>>> Lowering the time to get invested in ctakes gets more users AND better
>> bug
>>>> reports, FAQ, etc.
>>>> 
>>>> thoughts?
>>>> --Andy
>>>> 
>>>> 
>>>> On Apr 11, 2013, at 8:55 PM, "Chen, Pei" <
>> Pei.Chen@childrens.harvard.edu>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> I just wanted to gauge the interest of creating the next release of
>>>> cTAKES (3.1) which is currently marked for May in Jira-
>>>>> 
>>>>> There have already been 22/53 issues [1] marked as fixed or closed.
>>>> Plenty of bug fixes and new components including:
>>>>> - New CEM Instance Template population
>>>>> - New Dependency Parser/Semantic Role Labeler
>>>>> - New optional Clear POSTagger
>>>>> - New regression testing component
>>>>> 
>>>>> Should we wait for the Temporal component?
>>>>> 
>>>>> [1]
>>>> 
>> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%223.1%22%20AND%20project%20%3D%20CTAKES
>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message