mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent
Date Sat, 17 Apr 2010 18:23:21 GMT
At the moment I'm already overreaching on the way to fix MAHOUT-379
with this patch, as I've expanded to address some mildly related
issues (equals, iterators).

So I personally am not trying to change serialization formats in
MAHOUT-379 / my current patch, no. The issue uncovered by removing
name relates to serialization format (since that becomes a vector's
new 'name') but is not a problem with the GSON format per se.

I also don't really want to rip up Writable too much, no. I have other
pet issues to foist on the project first.

At the moment I want to understand how to patch up the fuzzy k-means
code in this regard -- will probably switch to something slightly less
state-dependent than asFormatString() as a key and be done with it for
the moment.

On Sat, Apr 17, 2010 at 6:39 PM, Drew Farris <> wrote:
> it is worth some investigation to determine if there is merit to
> adapting Mahout's MR jobs to use avro. Doug has recently committed a
> patch to avro ( that
> involves considerably less complexity than what I had originally
> proposed in, based on
> the initial proposed avro/mapreduce integration in MAPREDUCE-815.
> I'm half waiting for avro 1.4 to be released (which will include
> AVRO-493) before I dig into further proofs-of-concept of avro usage in
> Mahout, but I think there is something there worth seriously
> exploring. (half procrastinating otherwise)
> Drew
> On Sat, Apr 17, 2010 at 12:43 PM, Jeff Eastman
> <> wrote:
>> Seems like a major rewrite to replace Writable within our MR jobs.

View raw message