Return-Path: X-Original-To: apmail-incubator-opennlp-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-opennlp-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E4BB4C6E for ; Tue, 5 Jul 2011 02:39:35 +0000 (UTC) Received: (qmail 64673 invoked by uid 500); 5 Jul 2011 02:39:34 -0000 Delivered-To: apmail-incubator-opennlp-dev-archive@incubator.apache.org Received: (qmail 64604 invoked by uid 500); 5 Jul 2011 02:39:33 -0000 Mailing-List: contact opennlp-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: opennlp-dev@incubator.apache.org Delivered-To: mailing list opennlp-dev@incubator.apache.org Received: (qmail 64594 invoked by uid 99); 5 Jul 2011 02:39:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jul 2011 02:39:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jasonbaldridge@gmail.com designates 209.85.210.47 as permitted sender) Received: from [209.85.210.47] (HELO mail-pz0-f47.google.com) (209.85.210.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jul 2011 02:39:29 +0000 Received: by pzk36 with SMTP id 36so3008494pzk.6 for ; Mon, 04 Jul 2011 19:39:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=oFHZ7h3hQYU1GJqPxOt6S4Hm2ac2kWENHh1ipViYV+s=; b=XaDOjSId3gBU7oiD1Ahox5vygSP4gwI9z3EHiNHSnc4R3Iuh8V9F6DumJtDeoqtxlN 6sU0WrAT+behJ8c1OR2hQtqxZpSYo32Ic4RYBrAgqVbgFgHZGqYUhEAc3AV/QjFiDz22 TO7dFChaYK4dYRjY+Vab1Jy8JrrqXMtYh6cWs= MIME-Version: 1.0 Received: by 10.142.165.7 with SMTP id n7mr3104765wfe.199.1309833548436; Mon, 04 Jul 2011 19:39:08 -0700 (PDT) Received: by 10.142.154.7 with HTTP; Mon, 4 Jul 2011 19:39:08 -0700 (PDT) Reply-To: jbaldrid@mail.utexas.edu In-Reply-To: References: <4E0D8F92.6000304@gmail.com> <4E0E284F.3090501@gmail.com> <4E118C09.9030701@gmail.com> Date: Mon, 4 Jul 2011 21:39:08 -0500 Message-ID: Subject: Re: Coreference almost dead? From: Jason Baldridge To: opennlp-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=000e0cd2bf585e93d704a74963c3 --000e0cd2bf585e93d704a74963c3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The OpenNLP one is maxent based, based on Tom Morton's dissertation work. I= f I'm not mistaken the Stanford implementation requires good parser output, which requires good training data. We can do that for English, but that obviously creates an additional bottleneck for other languages for which we can't get training data for the parser. And, there would need to be some effort adapting the rules for another language, in all likelihood. FWIW, I think it is cool that much can be gotten out of a rule-based system= , but it is not *strictly* rule-based since it relies on a great deal of machine-learning based preprocessing. In other words, there is a lot more going on under the hood. -Jason On Mon, Jul 4, 2011 at 4:55 AM, Olivier Grisel wr= ote: > 2011/7/4 J=F6rn Kottmann : > > On 7/1/11 10:04 PM, James Kosin wrote: > >> > >> +1 coref is key to understanding of relationships that are referenced > >> later in sentences using pronouns. I'll go check on the data and how t= o > >> integrate it into the correct format. > > > > That would nice, we need to get this data set through LDC, at least it = is > > free. Afterward we need to define > > a format for the coref component, write some training code, etc. so it = is > > really a bit more work in this > > case. > > Out of curiosity is the existing OpenNLP coref implementation > MaxEnt-based or is it rule based like the state of the art StanfordNLP > implementation? > > http://nlp.stanford.edu/software/dcoref.shtml > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > --=20 Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge --000e0cd2bf585e93d704a74963c3--