hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Kubina <jeff.kub...@gmail.com>
Subject Re: How to make a MapReduce job with no input?
Date Fri, 01 Mar 2013 01:41:05 GMT

To do this for the more general case of creating N map jobs with each job
receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote
an InputFormat, InputSplit, and RecordReader Hadoop class. The sample code
is here <http://goo.gl/npKfP>. I think I wrote those for Hadoop 0.19, so
they may need some tweaking for subsequent versions.


On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <mspreitz@us.ibm.com> wrote:

> On closer inspection, I see that of my two tasks: the first processes 1
> input record and the other processes 0 input records.  So I think this
> solution is correct.  But perhaps it is not the most direct way to get the
> job done?
> From:        Mike Spreitzer/Watson/IBM@IBMUS
> To:        user@hadoop.apache.org,
> Date:        02/28/2013 04:18 PM
> Subject:        How to make a MapReduce job with no input?
> ------------------------------
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info needed
> in Mapper).  What is a good way to do this?
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
> given key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
> Thanks,
> Mike

View raw message