lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Van Den Berghe, Vincent" <Vincent.VanDenBer...@bvdinfo.com>
Subject TestAgainstBrzozowski weirdness
Date Thu, 23 Mar 2017 14:56:49 GMT
Here's a fun fact about TestAgainstBrzozowski.
This is the original test which fails every time:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                AutomatonTestUtil.MinimizeSimple(a);
                Automaton b = (Automaton)a.Clone();
                MinimizationOperations.Minimize(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

If we change the call to AutomatonTestUtil.MinimizeSimple(a) by MinimizationOperations.Minimize(a),
the test succeeds:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                MinimizationOperations.Minimize(a);
                Automaton b = (Automaton)a.Clone();
                MinimizationOperations.Minimize(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

Before you say "big deal", if we replace the call to MinimizationOperations.Minimize(b) by
AutomatonTestUtil.MinimizeSimple(b), the test fails every time:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                AutomatonTestUtil.MinimizeSimple(a);
                Automaton b = (Automaton)a.Clone();
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

Not only that, but if we re-order the clone operation to be executed on the un-minimized automaton
(to make sure we really have 2 identical automatons) like this:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                Automaton b = (Automaton)a.Clone();
                AutomatonTestUtil.MinimizeSimple(a);
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

... the test fails as well. This is contrary to what is claimed in the big giant comment.
To make sure that cloning works, this:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                Automaton b = (Automaton)a.Clone();
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());
                AutomatonTestUtil.MinimizeSimple(a);
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

... never fails in the first 3 assertions. Always fails in one of the last ones.

Preliminary conclusion:  if AutomatonTestUtil.MinimizeSimple can't give the same results for
identical automatons, it's not a good basis for a test.
As to why... I searched for random factors in the AutomatonTestUtil.MinimizeSimple, but found
none.  It looks like the algorithm is deterministic.
The only strange thing is the GetHashCode() implementation on State: it's bad form to define
a hash code without an Equals method, especially if you're putting States in hash sets or
dictionaries. But it seems like reference comparison is all that's needed, and since the hashcode
is a different one for every instance, removing the GetHashCode method or adding the Equals
method has no effect.

However, I can make the last 3 failing tests cited above work by correcting a bug in ValueHashSet<T>.GetHashCode.
The current definition is this:


        public override int GetHashCode()
        {
            int h = 0;
            var i = GetEnumerator();
            while (i.MoveNext())
            {
                T obj = i.Current;
                if (!EqualityComparer<T>.Default.Equals(obj, default(T)))
                {
                    h = HashHelpers.CombineHashCodes(h, obj.GetHashCode());
                }
            }
            return h;
        }


This definition is wrong, since it relies on the incorrect assumption that sets containing
identical elements will enumerate them in identical order. There is no order defined in a
HashSet<T>, and if you have 2 sets to which you add items a after b in one, and b after
a in another, the sets are identical but their enumerators will not necessarily be. The operation
HashHelpers.CombineHashCodes(h, obj.GetHashCode()) is noncommutative, causing equal set to
generate different hashcodes. Using:

        public override int GetHashCode()
        {
            int h = 0;
            foreach(var obj in this)
            {
                if (!EqualityComparer<T>.Default.Equals(obj, default(T)))
                {
                                      h += obj.GetHashCode();
                }
            }
            return h;
        }


Will make the last 3 failing tests work correct. But not the original one!



More weirdness, I suppose.

Vincent

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message