opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: Host stock models in maven central
Date Wed, 08 Aug 2012 14:31:56 GMT
Sorry if I missed something along the way -- who did the annotation of the
Wikipedia data?

BTW, the OANC will soon come out with their 3.0 release of MASC (the
Manually Annotated Sub-Corpus), with about 800k tokens of English text
(multiple domains, including twitter, blogs, transcribed spoken, and more)
labeled with several different levels of analysis, including chunks (noun
and verb), entities, tokens, POS tags, sentence boundaries, and logical
forms.

http://www.americannationalcorpus.org/MASC/Home.html

On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <kottmann@gmail.com> wrote:

> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
>
>> Hi, here are some models trained on Wikipedia data.  They have similar
>> performance.  Is this useful?
>>
>
> Yes, people who do not have access to our MUC based training
> data can just use the wiki data instead and combine it with their data.
>
> Thanks for sharing.
>
> Now all we need is a way to get label corrections from the community :-)
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message