mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: [GSOC] Matrix Operations on HDFS
Date Mon, 31 May 2010 01:30:03 GMT
The Distributed Row Matrix should be ideal for this.  When you run mappers
against this data structure, each mapper gets a different row.  You can use
assign to compute your function on each element of a row in the mapper.
 Define number of reducers = 0 and you are set.

Are you sure that you don't need some kind of reduction function, however?

You might also look at the k-means clustering which probably is related to
what you are doing in some sense.

On Sun, May 30, 2010 at 3:24 PM, Sisir Koppaka <>wrote:

> I think I need the sort of operation Jake described above  -
> wherein I can call a function f on a vector of the whole matrix(the dataset
> here, which is sparse) in a distributed fashion) I'll see this in detail
> tomorrow. But any other pointers on this issue with reference to the
> MAHOUT-375.diff update are very welcome.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message