Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 16D00200C67 for ; Mon, 15 May 2017 10:39:29 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 155C9160BC2; Mon, 15 May 2017 08:39:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5B68E160BC1 for ; Mon, 15 May 2017 10:39:28 +0200 (CEST) Received: (qmail 9289 invoked by uid 500); 15 May 2017 08:39:27 -0000 Mailing-List: contact dev-help@opennlp.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@opennlp.apache.org Delivered-To: mailing list dev@opennlp.apache.org Received: (qmail 9277 invoked by uid 99); 15 May 2017 08:39:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 May 2017 08:39:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B5202180A5A for ; Mon, 15 May 2017 08:39:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.129 X-Spam-Level: *** X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id NZ6pyfNyPCph for ; Mon, 15 May 2017 08:39:24 +0000 (UTC) Received: from mail-oi0-f53.google.com (mail-oi0-f53.google.com [209.85.218.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 402AC5FB06 for ; Mon, 15 May 2017 08:39:24 +0000 (UTC) Received: by mail-oi0-f53.google.com with SMTP id w10so122279233oif.0 for ; Mon, 15 May 2017 01:39:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=EU13yfxdxn8efErSjHI9lM0XA68KqukiTtLPDFzqLe8=; b=hk4dpclHHU3ID8N9YsL4/f5b7woHftFtuMkSMz6LFkTOnEkYfZy0ZaZoY/duCul0yI VyqyVkgI4qw2znEqQEreKDOiKNW0syBnlEb+EFbLss5Arl5kiNkJKi2GVRg+cBsNzPuF bJbJkGIkfXfm+CANG5Z4qNfAvUMngHUKn5yX3dL2Uq6ryP1AL1BxY/cd6cTDsGAm8L4B k8jWjU1lxCQw/IKZCJdq3nJBWvAcgRTL60Pg7iBAdyersfheVO9kq67J3JBRW8osW9Ky 05JLxs7NEZKk8hsv2q/T7+sedGWi1uSZabwaObU4L8aWKnzNGcudUeMwSTFHJbTZj6K/ V4FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=EU13yfxdxn8efErSjHI9lM0XA68KqukiTtLPDFzqLe8=; b=pmbyrtc+PjQSQDrtge2yFrAt5pHLxQgni7jaBXRRrc9aKNFh+JmkRi21zbqex9nITD RDUpLdS9KJMaMLX2usyJMAEQZ0s1WnAjhWejmgAdU2e3vth3tMO9AYiIdEbzvX66rGUd 8IBiu4AdPQhXUxUEQIadfp0gjC081vNy+N3dsVkCh5bwFe+MC5Ba7mCoOAkDcw4TTSZJ UCuj5K4abfSt5XfrN1zrV2uBrDZsZn1uJ/6nioLmzj0r4MK/E+32xIHAlhxMBUlONXZc ZpP9HYKdo0CxXiCuusMKKj/TxTBMjWpzA3GQ2F8CYpUSGxT5wE1DPPsAzIjScnw3fPxZ m54Q== X-Gm-Message-State: AODbwcC0rMaw0OPSJi+cHszNRNojjk4mZfhZnrseSDEqFPrBbXQTXJ7h tWGsm9PTNQ0rv41vDcF1ouIGA+fD8o3rVX0= X-Received: by 10.202.224.198 with SMTP id x189mr2246324oig.213.1494837563568; Mon, 15 May 2017 01:39:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.37.39 with HTTP; Mon, 15 May 2017 01:39:23 -0700 (PDT) In-Reply-To: <9DA5AA0B-3F01-4232-9DC1-1D90E8B8CB0D@apache.org> References: <1494520661.11460.5.camel@gmail.com> <9DA5AA0B-3F01-4232-9DC1-1D90E8B8CB0D@apache.org> From: Joern Kottmann Date: Mon, 15 May 2017 10:39:23 +0200 Message-ID: Subject: Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2 To: "dev@opennlp.apache.org" Content-Type: multipart/alternative; boundary="001a113d3430f93940054f8bfd57" archived-at: Mon, 15 May 2017 08:39:29 -0000 --001a113d3430f93940054f8bfd57 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Richard, thanks for reporting this. For 1.8.0 we replaced a Heap with a SortedSet [1]. In this commit there is one loop [2] which iterates through the parses which will be advanced. The order of the Parsers in the Heap was not so well defined, therefore we decided to sort them by probability. We also noticed that this change is changing the output of the parser with the existing models in our SourceForge model eval test [3]. After running the evaluation on the OntoNotes4 data set I only got very small change and decided it is ok to do this. I am not aware of how big the change is but is was less than the delta in test case [4] of 0.001. What do you think? Should this be rolled back? Anyway, that said, about the parser, I still need to understand what happened with the lemmatizer. J=C3=B6rn [1] https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56= e0611c [2] https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0= 611c/opennlp-tools/src/main/java/opennlp/tools/parser/AbstractBottomUpParse= r.java#L285 [3] https://github.com/apache/opennlp/commit/3df659b9bfb02084e782f1e8b6ec716f56= e0611c#diff-a5834f32b8a41b76a336126e4b13d4f7L349 [4] https://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0= 611c/opennlp-tools/src/test/java/opennlp/tools/eval/OntoNotes4ParserEval.ja= va#L70 On Sat, May 13, 2017 at 10:35 PM, Richard Eckart de Castilho wrote: > Hi all, > > > On 11.05.2017, at 18:37, Joern Kottmann wrote: > > > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP > > 1.8.0 Release Candidate 2. > > Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same > models are used during classification? > > E.g. the English parser model seems to create different POS tags now > for the sentence "We need a very complicated example sentence , > which contains as many constituents and dependencies as possible .". > "a" is now wrongly tagged as "," whereas 1.7.2 tagged it correctly as "DT= ". > > Should OpenNLP 1.8.0 yield identical results as 1.7.2 when the same > training data is used during training? > > I have a test that trains a lemmatizer model on GUM 3.0.0. With 1.7.2, > this model reached an f-score of ~0.96. With 1.8.0, I only get ~0.84. > > Cheers, > > -- Richard > > > --001a113d3430f93940054f8bfd57--