hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: Questions About Passing Parameters to Hadoop Job
Date Sun, 22 Nov 2009 20:34:46 GMT
So, you want to read the sample file in main and add each line to job by job.set, and then
read these lines in mapper by job.get?

I think it is better to name the data file as input source to mapper, while read the whole
sample file in each mapper instance using HDFS api, and then compare them. It is actually
how map-side join works. 

Gang Luo
Department of Computer Science
Duke University

----- 原始邮件 ----
发件人: Boyu Zhang <boyuzhang35@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/22 (周日) 3:21:23 下午
主   题: Questions About Passing Parameters to Hadoop Job

Dear All,

I am implementing an algorithm that read a data file(.txt file,
approximately 90MB), compare each line of the data file with each line of a
specific samples file(.txt file, approximately 20MB). To do this, I need to
pass each line of the samples file as parameters to map-reduce job. And they
are large, in a sense.

My current way is that I use the job.set and job.get to set and retrieve
these lines as configurations. But it is not efficient at all!

Could anyone help me with an alternative solution? Thanks a million!

Boyu Zhang
University of Delaware


View raw message