mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent
Date Sat, 17 Apr 2010 12:30:24 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated MAHOUT-379:
-----------------------------

      Status: Patch Available  (was: Open)
    Assignee: Sean Owen

Here's another patch, which builds on the previous changes, which did not directly relate
to this issue but go along with them.

This patch does include the test case, and more, which passes.

it resolves the issue by removing Vector names, and removing both equivalent() and strictEquivalence().
equals() now compares only values.

Further, equals() and hashCode() are paired up in AbstractVector to do more to ensure they
are consistent, and correct. We had, for instance, a transitivity problem with VectorView.equals().
Subclasses may specialize for performance; DenseVector still does this for instance and we
can revisit this with care later.

Good news is this patch net-net removes 170 lines of code, even with new unit tests.

It builds on the previous patch's refactoring of iterator methods.

It's a big patch so deserves some looking.

> SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent
> ---------------------------------------------------------------------------------
>
>                 Key: MAHOUT-379
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-379
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Danny Leshem
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.3
>
>         Attachments: MAHOUT-379.patch, MAHOUT-379.patch
>
>
> When a SequentialAccessSparseVector is serialized and deserialized using VectorWritable,
the result vector and the original vector are equivalent, yet equals returns false.
> The following unit-test reproduces the problem:
> {code}
> @Test
> public void testSequentialAccessSparseVectorEquals() throws Exception {
>     final Vector v = new SequentialAccessSparseVector(1);
>     final VectorWritable vectorWritable = new VectorWritable(v);
>     final VectorWritable vectorWritable2 = new VectorWritable();
>     writeAndRead(vectorWritable, vectorWritable2);
>     final Vector v2 = vectorWritable2.get();
>     assertTrue(AbstractVector.equivalent(v, v2));
>     assertEquals(v, v2); // This line fails!
> }
> private void writeAndRead(Writable toWrite, Writable toRead) throws IOException {
>     final ByteArrayOutputStream baos = new ByteArrayOutputStream();
>     final DataOutputStream dos = new DataOutputStream(baos);
>     toWrite.write(dos);
>     final ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
>     final DataInputStream dis = new DataInputStream(bais);
>     toRead.readFields(dis);
> }
> {code}
> The problem seems to be that the original vector name is null, while the new vector's
name is an empty string. The same issue probably also happens with RandomAccessSparseVector.
> SequentialAccessSparseVectorWritable (line 40):
> {code}
> dataOutput.writeUTF(getName() == null ? "" : getName());
> {code}
> RandomAccessSparseVectorWritable (line 42):
> {code}
> dataOutput.writeUTF(this.getName() == null ? "" : this.getName());
> {code}
> The simplest fix is probably to change the default Vector's name from null to the empty
string.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message