Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7C9C200C5D for ; Fri, 24 Mar 2017 03:22:12 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E6381160B84; Fri, 24 Mar 2017 02:22:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 370D4160B83 for ; Fri, 24 Mar 2017 03:22:12 +0100 (CET) Received: (qmail 99764 invoked by uid 500); 24 Mar 2017 02:22:06 -0000 Mailing-List: contact dev-help@lucenenet.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucenenet.apache.org Delivered-To: mailing list dev@lucenenet.apache.org Received: (qmail 99752 invoked by uid 99); 24 Mar 2017 02:22:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Mar 2017 02:22:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8FE551889E9 for ; Fri, 24 Mar 2017 02:22:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2 X-Spam-Level: ** X-Spam-Status: No, score=2 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 5EcWfQaMVph3 for ; Fri, 24 Mar 2017 02:22:02 +0000 (UTC) Received: from ex10cshbfe01.apps4rent.net (ex10cshbfe02.apps4rent.net [69.160.246.193]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id CE9245F3F5 for ; Fri, 24 Mar 2017 02:22:01 +0000 (UTC) Received: from EX10DAG10-N1.apps4rent.net ([10.10.10.123]) by EX10CSHBFE01 ([10.10.10.113]) with mapi id 14.03.0319.002; Thu, 23 Mar 2017 19:22:46 -0700 From: Shad Storhaug To: "Van Den Berghe, Vincent" , "dev@lucenenet.apache.org" Subject: RE: TestAgainstBrzozowski weirdness Thread-Topic: TestAgainstBrzozowski weirdness Thread-Index: AdKj4t5DZcp2Xe9pRTeW10lOX84ccAAYlfHA Date: Fri, 24 Mar 2017 02:22:46 +0000 Message-ID: <4606B227B7AF19498F107C2C59CC9849B82B50F7@EX10DAG10-N1> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [180.180.2.109] Content-Type: multipart/alternative; boundary="_000_4606B227B7AF19498F107C2C59CC9849B82B50F7EX10DAG10N1_" MIME-Version: 1.0 archived-at: Fri, 24 Mar 2017 02:22:13 -0000 --_000_4606B227B7AF19498F107C2C59CC9849B82B50F7EX10DAG10N1_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Vincent, I put the GetHashCode() fix into both ValueList and ValueHashSet. You unwit= tingly reverted it back to its original Java state. Thanks, Shad Storhaug (NightOwl888) From: Van Den Berghe, Vincent [mailto:Vincent.VanDenBerghe@bvdinfo.com] Sent: Thursday, March 23, 2017 9:57 PM To: dev@lucenenet.apache.org Cc: Shad Storhaug Subject: TestAgainstBrzozowski weirdness Here's a fun fact about TestAgainstBrzozowski. This is the original test which fails every time: Automaton a =3D AutomatonTestUtil.RandomAutomaton(Random())= ; AutomatonTestUtil.MinimizeSimple(a); Automaton b =3D (Automaton)a.Clone(); MinimizationOperations.Minimize(b); Assert.IsTrue(BasicOperations.SameLanguage(a, b)); Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates(= )); Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTr= ansitions()); If we change the call to AutomatonTestUtil.MinimizeSimple(a) by Minimizatio= nOperations.Minimize(a), the test succeeds: Automaton a =3D AutomatonTestUtil.RandomAutomaton(Random())= ; MinimizationOperations.Minimize(a); Automaton b =3D (Automaton)a.Clone(); MinimizationOperations.Minimize(b); Assert.IsTrue(BasicOperations.SameLanguage(a, b)); Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates(= )); Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTr= ansitions()); Before you say "big deal", if we replace the call to MinimizationOperations= .Minimize(b) by AutomatonTestUtil.MinimizeSimple(b), the test fails every t= ime: Automaton a =3D AutomatonTestUtil.RandomAutomaton(Random())= ; AutomatonTestUtil.MinimizeSimple(a); Automaton b =3D (Automaton)a.Clone(); AutomatonTestUtil.MinimizeSimple(b); Assert.IsTrue(BasicOperations.SameLanguage(a, b)); Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates(= )); Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTr= ansitions()); Not only that, but if we re-order the clone operation to be executed on the= un-minimized automaton (to make sure we really have 2 identical automatons= ) like this: Automaton a =3D AutomatonTestUtil.RandomAutomaton(Random())= ; Automaton b =3D (Automaton)a.Clone(); AutomatonTestUtil.MinimizeSimple(a); AutomatonTestUtil.MinimizeSimple(b); Assert.IsTrue(BasicOperations.SameLanguage(a, b)); Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates(= )); Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTr= ansitions()); ... the test fails as well. This is contrary to what is claimed in the big = giant comment. To make sure that cloning works, this: Automaton a =3D AutomatonTestUtil.RandomAutomaton(Random())= ; Automaton b =3D (Automaton)a.Clone(); Assert.IsTrue(BasicOperations.SameLanguage(a, b)); Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates(= )); Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTr= ansitions()); AutomatonTestUtil.MinimizeSimple(a); AutomatonTestUtil.MinimizeSimple(b); Assert.IsTrue(BasicOperations.SameLanguage(a, b)); Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates(= )); Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTr= ansitions()); ... never fails in the first 3 assertions. Always fails in one of the last = ones. Preliminary conclusion: if AutomatonTestUtil.MinimizeSimple can't give the= same results for identical automatons, it's not a good basis for a test. As to why... I searched for random factors in the AutomatonTestUtil.Minimiz= eSimple, but found none. It looks like the algorithm is deterministic. The only strange thing is the GetHashCode() implementation on State: it's b= ad form to define a hash code without an Equals method, especially if you'r= e putting States in hash sets or dictionaries. But it seems like reference = comparison is all that's needed, and since the hashcode is a different one = for every instance, removing the GetHashCode method or adding the Equals me= thod has no effect. However, I can make the last 3 failing tests cited above work by correcting= a bug in ValueHashSet.GetHashCode. The current definition is this: public override int GetHashCode() { int h =3D 0; var i =3D GetEnumerator(); while (i.MoveNext()) { T obj =3D i.Current; if (!EqualityComparer.Default.Equals(obj, default(T))) { h =3D HashHelpers.CombineHashCodes(h, obj.GetHashCode()= ); } } return h; } This definition is wrong, since it relies on the incorrect assumption that = sets containing identical elements will enumerate them in identical order. = There is no order defined in a HashSet, and if you have 2 sets to which = you add items a after b in one, and b after a in another, the sets are iden= tical but their enumerators will not necessarily be. The operation HashHelp= ers.CombineHashCodes(h, obj.GetHashCode()) is noncommutative, causing equal= set to generate different hashcodes. Using: public override int GetHashCode() { int h =3D 0; foreach(var obj in this) { if (!EqualityComparer.Default.Equals(obj, default(T))) { h +=3D obj.GetHashCode(); } } return h; } Will make the last 3 failing tests work correct. But not the original one! More weirdness, I suppose. Vincent --_000_4606B227B7AF19498F107C2C59CC9849B82B50F7EX10DAG10N1_--