incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Dligach <dmitriy.dlig...@childrens.harvard.edu>
Subject Re: new type: document label?
Date Mon, 19 Nov 2012 16:24:50 GMT
This is a very good point. Also it seems like documents with multiple 
labels is not a scenario that we face every day, so why don't we just 
create a new type (e.g. DocumentLabel) that derives from TOP and use it 
for a while to see if it satisfies our document classification needs?

Thanks,

Dima

On 11/16/2012 09:21 PM, Wu, Stephen T., Ph.D. wrote:
> Well, I think the downside to using Pair is that it is not self-documenting.
> In other words, everyone who did not see this thread will be faced with the
> same problem Dima and I were faced with (and have dealt with in different
> ways).  Everyone trying to use the common type system outside of UIMA would
> be probably be completely lost.
>
> Multiple labels... DocumentClass = List<Pair>?  Or you could just
> instantiate multiple DocumentClass types, and not necessarily have the
> Document refer to them...
>
> stephen
>
>
>
>> I suggest using the current types.
>>
>> I think if we add a new one, we would still want to handle multiple
>> classifications, and would still have the downside of having to iterate
>> through the classifications to find the one of interest.  So I'm not sure how
>> much we gain by adding a new type.
>>
>> But you are closer to this than I am so I would go with whatever you recommend
>> or others doing classification recommend.
>>
>> -- James
>>
>>
>>> -----Original Message-----
>>> From: ctakes-dev-return-874-Masanz.James=mayo.edu@incubator.apache.org
>>> [mailto:ctakes-dev-return-874-
>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Dmitriy Dligach
>>> Sent: Thursday, November 15, 2012 3:49 PM
>>> To: ctakes-dev@incubator.apache.org
>>> Subject: Re: new type: document label?
>>>
>>> James, thanks. This makes perfect sense.
>>>
>>> So what's the conclusion? Can we do with the current types, or do we
>>> still need to create a new one?
>>>
>>> Dima
>>>
>>> On 11/15/2012 03:43 PM, Masanz, James J. wrote:
>>>> Yes, you can put multiple Pair annotations in the CAS.
>>>> There is a Pairs (plural) annotation type which is a list (FSArray) of
>>> Pair annotations.
>>>> You could have two Pair annotations with
>>>> attribute="at_risk_for_early_brca"
>>>> value="T"
>>>>
>>>> attribute="alchohol_use"
>>>> value="heavy_drinker"
>>>>
>>>> The downside:
>>>> You have to iteratate through the Pair annotations to find the one
>>> with the attribute name you want.
>>>> The upside: we don't have to create new Annotation types for
>>> everything that might be imagined.
>>>> As Stephen points out, not everything in Pairs needs to be a document
>>>> class or related to the text within the document. It can be used for
>>> example to keep version information about a pipeline or anything any
>>> annotator wants. A totally made-up example could be
>>> attribute="dictionary_lookup_version"
>>>> value="3.2.1"
>>>>
>>>> -- James
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From:
>>>>> ctakes-dev-return-869-Masanz.James=mayo.edu@incubator.apache.org
>>>>> [mailto:ctakes-dev-return-869-
>>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Dmitriy
>>>>> Dligach
>>>>> Sent: Thursday, November 15, 2012 1:03 PM
>>>>> To: ctakes-dev@incubator.apache.org
>>>>> Subject: Re: new type: document label?
>>>>>
>>>>> Chen brings up a good point. But can't we solve this problem by
>>>>> creating multiple Pair annotations in the CAS?
>>>>>
>>>>> Dima
>>>>>
>>>>> On 11/15/2012 01:52 PM, Lin, Chen wrote:
>>>>>> I am curious to know if Pair allows multiple document level labels
>>>>>> for
>>>>> a single doc. It is possible that a single set of documents be used
>>>>> in multiple classification tasks.
>>>>>> For example, in one task a document may be labeled as "positive"
or
>>>>> "negative", in another task this same doc may be labeled as "high",
>>>>> "moderate" or "low".  Many thanks!
>>>>>> Best,
>>>>>> Chen
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dmitriy Dligach [mailto:dmitriy.dligach@childrens.harvard.edu]
>>>>>> Sent: Thursday, November 15, 2012 1:46 PM
>>>>>> To: ctakes-dev@incubator.apache.org
>>>>>> Subject: Re: new type: document label?
>>>>>>
>>>>>> Thank you, James.
>>>>>>
>>>>>> So, in general did you envision this type of use for Pair:
>>>>>>
>>>>>> Pair.attribute -> "document_label"
>>>>>> Pair.value -> "positive"
>>>>>>
>>>>>> I think this may work.
>>>>>>
>>>>>> Dima
>>>>>>
>>>>>> On 11/15/2012 10:22 AM, Masanz, James J. wrote:
>>>>>>> Pair (org.apache.ctakes.typesystem.type.util.Pair) is intended
for
>>>>> such document-level properties.
>>>>>>> Would that suit your need?
>>>>>>>
>>>>>>> -- James
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From:
>>>>>>>> ctakes-dev-return-854-Masanz.James=mayo.edu@incubator.apache.org
>>>>>>>> [mailto:ctakes-dev-return-854-
>>>>>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of
Dmitriy
>>>>>>>> Dligach
>>>>>>>> Sent: Thursday, November 15, 2012 9:16 AM
>>>>>>>> To: cTAKES Dev list @ ASF
>>>>>>>> Subject: new type: document label?
>>>>>>>>
>>>>>>>> We've recently been using cTAKES more and more for document-level
>>>>>>>> classification (e.g. phenotyping). Would it make sense to
add a
>>>>>>>> new type (that would derive from TOP) to store the label
for a
>>> document?
>>>>>>>> I know we currently have a doc id for each document, but
having
>>>>>>>> the label type would simplify a lot of things (e.g. debugging).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Dima


Mime
View raw message