opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: Coreference almost dead?
Date Tue, 05 Jul 2011 02:39:08 GMT
The OpenNLP one is maxent based, based on Tom Morton's dissertation work. If
I'm not mistaken the Stanford implementation requires good parser output,
which requires good training data. We can do that for English, but that
obviously creates an additional bottleneck for other languages for which we
can't get training data for the parser. And, there would need to be some
effort adapting the rules for another language, in all likelihood.

FWIW, I think it is cool that much can be gotten out of a rule-based system,
but it is not *strictly* rule-based since it relies on a great deal of
machine-learning based preprocessing. In other words, there is a lot more
going on under the hood.

-Jason

On Mon, Jul 4, 2011 at 4:55 AM, Olivier Grisel <olivier.grisel@ensta.org>wrote:

> 2011/7/4 Jörn Kottmann <kottmann@gmail.com>:
> > On 7/1/11 10:04 PM, James Kosin wrote:
> >>
> >> +1 coref is key to understanding of relationships that are referenced
> >> later in sentences using pronouns. I'll go check on the data and how to
> >> integrate it into the correct format.
> >
> > That would nice, we need to get this data set through LDC, at least it is
> > free. Afterward we need to define
> > a format for the coref component, write some training code, etc. so it is
> > really a bit more work in this
> > case.
>
> Out of curiosity is the existing OpenNLP coref implementation
> MaxEnt-based or is it rule based like the state of the art StanfordNLP
> implementation?
>
>  http://nlp.stanford.edu/software/dcoref.shtml
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message