hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Yu <sh...@uchicago.edu>
Subject Re: providing the same input to more than one Map task
Date Mon, 25 Apr 2011 14:09:29 GMT
Then, what is the main difference: (1) storing the input on the cluster 
shared directory, loading it in the configure stage of mappers  and (2) 
using the distributed cache?


On 4/25/2011 8:17 AM, Kai Voigt wrote:
> Hi,
> I'd use the distributed cache to store the vector on every mapper machine locally.
> Kai
> Am 22.04.2011 um 21:15 schrieb Alexandra Anghelescu:
>> Hi all,
>> I am trying to perform matrix-vector multiplication using Hadoop.
>> So I have matrix M in a file, and vector v in another file. How can I make
>> it so that each Map task will get the whole vector v and a chunk of matrix
>> M?
>> Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
>> where i is the row number, and j the column number. And the reduce function
>> will sum up all the values with the same key i, and that will be the ith
>> element of my result vector.
>> Or can you suggest another way to do it?
>> Thanks,
>> Alexandra Anghelescu

View raw message