mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
Subject Combiner applied on multiple map task outputs (like in Mahout SVD)
Date Wed, 26 Sep 2012 11:49:23 GMT
Hi guys,

I'm trying to understand the way the combiner in Mahout SVD works. (
https://cwiki.apache.org/MAHOUT/dimensional-reduction.html) As far as I
know from the Mahout math matrix-multiplication implementation, matrix A is
represented by column-vectors, matrix B is represented by row vectors and
an inner join executes an outer product of the columns of A with the rows
of B. All outer products are summed by the combiners and reducers. What I
am wondering about is how a combiner can actually combine multiple outer
products on the same datanode because the join-package requires the data to
be partitioned into unsplittable files. In this case, I understand that one
file contains one column/row of its corresponding matrix. Hence, each map
task receives a column-row-tuple, computes the outer product and emits the
result. My understanding of Hadoop is that the combiner follows a map task
immediately but one map task produces only a single result so there is
nothing to combine. If the combiner could accumulate the results of
multiple map task, I would understand the idea, but from my understanding
and tests, it does not.

Could anyone clarify the process please?

Thanks a lot!
Sigurd

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message