opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Grisel <olivier.gri...@ensta.org>
Subject Re: OpenNLP Annotations Proposal
Date Fri, 24 Jun 2011 16:42:52 GMT
2011/6/24 Jörn Kottmann <kottmann@gmail.com>:
> On 6/24/11 11:54 AM, Olivier Grisel wrote:
>>
>> but we need to agree on a CAS type system first. I don't
>> know the opennlp-uima myself and won't have time to invest more effort
>> on this project before mid-july unfortunately.
>
> I suggest that there are two classes of types in the type system.
>
> The first class contains annotations which describe the input we collect
> from our annotators and are also suitable to document comments and
> disagreements
> between annotators.
>
> And the second class of annotations contain standard linguistic annotations
> such as sentences, tokens, entities, chunks, parses, etc.
>
> The idea is that the annotation in the second class can be automatically
> be derived from the annotations in the first class. In case the article is
> not
> completely labeled the statistic models could fill the gap.
>
> For example, we could ask the annotators to label token splits, form these
> token splits we can derive the actual token annotations. For english texts
> the annotation ui could make use of the alpha num optimization and only
> ask the user for questionable token splits.
>
> A similar approach could be done for sentence annotations.
>
> For named entity annotations the user could do BIO style token labeling
> through a
> special ui, similar to the one in Walter. The BIO labels can then be used to
> compute the
> name spans.
>
> Our models can either be trained directly on the derived annotations, or we
> add a sentence level
> annotation where users needs to confirm that the entire sentence is labeled
> correctly, for example
> all person annotation are marked in this sentence.

I like the ability to move the UI focus from one sentence to another
and being able to mark a complete sentence as validated. +1 for the
rest of your proposal.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Mime
View raw message