opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Grisel <olivier.gri...@ensta.org>
Subject Re: OpenNLP Annotations Proposal
Date Fri, 10 Jun 2011 15:29:45 GMT
2011/6/10 Jason Baldridge <jasonbaldridge@gmail.com>:
> This looks great! I don't have time to look at this in great detail right
> now, but am happy to give feedback on particular issues and questions.
>
> Active learning would be nice to add eventually, but it has to be done with
> great care, e.g. using uncertainty alone doesn't really work that well and
> care needs to be taken with class imbalance etc. Random sampling is a good
> starting point, and can be used while ironing out the details.

Acknowledged. I wasn't planning to implement this part myself anyway.

> I can't remember if this has been discussed before, but does there need to
> be a non-OpenNLP group which has a primary purpose of creating open
> standardized datasets and annotation interfaces, etc?
>
> It seems also we might be able to get some corporate sponsorship for
> annotation, improvements to models, creation of resources for specific
> languages, etc.

No idea. I think Jacob Perkins (and possibly others) who works with
NLTK was also interested in such open copora. See for instance this
thread on metaoptimize.com/qa:

  http://metaoptimize.com/qa/questions/4650/what-licenses-cover-a-nltk-tagger-trained-on-treebank

> BTW, there is a lot that can be done to bootstrap POS-taggers from raw data
> and the tags in Wiktionary, so if folks are interested in that I'm happy to
> provide pointers.

As mentionned by Tommaso I think we should start to structure the wiki
for this effort. Do you want me to create sub-pages of [1] for
POS-tagging and NE detection? I could write the NE detection page
with a description of the current effort on corpus-refiner / Walter
and let you add pointers for the POS tags case.

[1] https://cwiki.apache.org/OPENNLP/opennlp-annotations.html

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Mime
View raw message