incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin, Chen" <Chen....@childrens.harvard.edu>
Subject Re: new type: document label?
Date Tue, 20 Nov 2012 16:21:35 GMT
How about DocumentLabel or DocumentClassLabel?

Sent from my iPhone

On Nov 20, 2012, at 11:01 AM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu> wrote:

> If we do decide to create a new type, could we call it something like DocumentClass or
DocumentClassification and have some attribute(s) called "label"?  Otherwise we may need a
WSD component to disambiguate "Label" from general doc metadata... :)
> 
> 
> 
> On Nov 19, 2012, at 11:25 AM, "Dmitriy Dligach" <dmitriy.dligach@childrens.harvard.edu>
wrote:
> 
>> This is a very good point. Also it seems like documents with multiple labels is not
a scenario that we face every day, so why don't we just create a new type (e.g. DocumentLabel)
that derives from TOP and use it for a while to see if it satisfies our document classification
needs?
>> 
>> Thanks,
>> 
>> Dima
>> 
>> On 11/16/2012 09:21 PM, Wu, Stephen T., Ph.D. wrote:
>>> Well, I think the downside to using Pair is that it is not self-documenting.
>>> In other words, everyone who did not see this thread will be faced with the
>>> same problem Dima and I were faced with (and have dealt with in different
>>> ways).  Everyone trying to use the common type system outside of UIMA would
>>> be probably be completely lost.
>>> 
>>> Multiple labels... DocumentClass = List<Pair>?  Or you could just
>>> instantiate multiple DocumentClass types, and not necessarily have the
>>> Document refer to them...
>>> 
>>> stephen
>>> 
>>> 
>>> 
>>>> I suggest using the current types.
>>>> 
>>>> I think if we add a new one, we would still want to handle multiple
>>>> classifications, and would still have the downside of having to iterate
>>>> through the classifications to find the one of interest.  So I'm not sure
how
>>>> much we gain by adding a new type.
>>>> 
>>>> But you are closer to this than I am so I would go with whatever you recommend
>>>> or others doing classification recommend.
>>>> 
>>>> -- James
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: ctakes-dev-return-874-Masanz.James=mayo.edu@incubator.apache.org
>>>>> [mailto:ctakes-dev-return-874-
>>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Dmitriy Dligach
>>>>> Sent: Thursday, November 15, 2012 3:49 PM
>>>>> To: ctakes-dev@incubator.apache.org
>>>>> Subject: Re: new type: document label?
>>>>> 
>>>>> James, thanks. This makes perfect sense.
>>>>> 
>>>>> So what's the conclusion? Can we do with the current types, or do we
>>>>> still need to create a new one?
>>>>> 
>>>>> Dima
>>>>> 
>>>>> On 11/15/2012 03:43 PM, Masanz, James J. wrote:
>>>>>> Yes, you can put multiple Pair annotations in the CAS.
>>>>>> There is a Pairs (plural) annotation type which is a list (FSArray)
of
>>>>> Pair annotations.
>>>>>> You could have two Pair annotations with
>>>>>> attribute="at_risk_for_early_brca"
>>>>>> value="T"
>>>>>> 
>>>>>> attribute="alchohol_use"
>>>>>> value="heavy_drinker"
>>>>>> 
>>>>>> The downside:
>>>>>> You have to iteratate through the Pair annotations to find the one
>>>>> with the attribute name you want.
>>>>>> The upside: we don't have to create new Annotation types for
>>>>> everything that might be imagined.
>>>>>> As Stephen points out, not everything in Pairs needs to be a document
>>>>>> class or related to the text within the document. It can be used
for
>>>>> example to keep version information about a pipeline or anything any
>>>>> annotator wants. A totally made-up example could be
>>>>> attribute="dictionary_lookup_version"
>>>>>> value="3.2.1"
>>>>>> 
>>>>>> -- James
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From:
>>>>>>> ctakes-dev-return-869-Masanz.James=mayo.edu@incubator.apache.org
>>>>>>> [mailto:ctakes-dev-return-869-
>>>>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Dmitriy
>>>>>>> Dligach
>>>>>>> Sent: Thursday, November 15, 2012 1:03 PM
>>>>>>> To: ctakes-dev@incubator.apache.org
>>>>>>> Subject: Re: new type: document label?
>>>>>>> 
>>>>>>> Chen brings up a good point. But can't we solve this problem
by
>>>>>>> creating multiple Pair annotations in the CAS?
>>>>>>> 
>>>>>>> Dima
>>>>>>> 
>>>>>>> On 11/15/2012 01:52 PM, Lin, Chen wrote:
>>>>>>>> I am curious to know if Pair allows multiple document level
labels
>>>>>>>> for
>>>>>>> a single doc. It is possible that a single set of documents be
used
>>>>>>> in multiple classification tasks.
>>>>>>>> For example, in one task a document may be labeled as "positive"
or
>>>>>>> "negative", in another task this same doc may be labeled as "high",
>>>>>>> "moderate" or "low".  Many thanks!
>>>>>>>> Best,
>>>>>>>> Chen
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Dmitriy Dligach [mailto:dmitriy.dligach@childrens.harvard.edu]
>>>>>>>> Sent: Thursday, November 15, 2012 1:46 PM
>>>>>>>> To: ctakes-dev@incubator.apache.org
>>>>>>>> Subject: Re: new type: document label?
>>>>>>>> 
>>>>>>>> Thank you, James.
>>>>>>>> 
>>>>>>>> So, in general did you envision this type of use for Pair:
>>>>>>>> 
>>>>>>>> Pair.attribute -> "document_label"
>>>>>>>> Pair.value -> "positive"
>>>>>>>> 
>>>>>>>> I think this may work.
>>>>>>>> 
>>>>>>>> Dima
>>>>>>>> 
>>>>>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote:
>>>>>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is
intended for
>>>>>>> such document-level properties.
>>>>>>>>> Would that suit your need?
>>>>>>>>> 
>>>>>>>>> -- James
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From:
>>>>>>>>>> ctakes-dev-return-854-Masanz.James=mayo.edu@incubator.apache.org
>>>>>>>>>> [mailto:ctakes-dev-return-854-
>>>>>>>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf
Of Dmitriy
>>>>>>>>>> Dligach
>>>>>>>>>> Sent: Thursday, November 15, 2012 9:16 AM
>>>>>>>>>> To: cTAKES Dev list @ ASF
>>>>>>>>>> Subject: new type: document label?
>>>>>>>>>> 
>>>>>>>>>> We've recently been using cTAKES more and more for
document-level
>>>>>>>>>> classification (e.g. phenotyping). Would it make
sense to add a
>>>>>>>>>> new type (that would derive from TOP) to store the
label for a
>>>>> document?
>>>>>>>>>> I know we currently have a doc id for each document,
but having
>>>>>>>>>> the label type would simplify a lot of things (e.g.
debugging).
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> Dima
>> 

Mime
View raw message