Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jeff.kubina@gmail.com
 designates 209.85.216.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <OF3C0D0883.6BD087F6-ON85257B20.00758FF7-85257B20.0075A73D@us.ibm.com>
References: 
 <OF2E49522C.FD177C13-ON85257B20.00741928-85257B20.0074DD45@us.ibm.com>
 <OF3C0D0883.6BD087F6-ON85257B20.00758FF7-85257B20.0075A73D@us.ibm.com>
From: Jeff Kubina <jeff.kubina@gmail.com>
Date: Thu, 28 Feb 2013 20:41:05 -0500
Message-ID: 
 <CA+Vtps54KeDwGe0rGCvn1Vm2K2G1QzxJEpbcWrqOm=P=ud+dtw@mail.gmail.com>
Subject: Re: How to make a MapReduce job with no input?
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=485b397dd78df19b8604d6d31a32

--485b397dd78df19b8604d6d31a32
Content-Type: text/plain; charset=ISO-8859-1

Mike,

To do this for the more general case of creating N map jobs with each job
receiving the one record <i, n>, where i ranges from 0 to n-1, I wrote
an InputFormat, InputSplit, and RecordReader Hadoop class. The sample code
is here <http://goo.gl/npKfP>. I think I wrote those for Hadoop 0.19, so
they may need some tweaking for subsequent versions.

Jeff

On Thu, Feb 28, 2013 at 4:25 PM, Mike Spreitzer <mspreitz@us.ibm.com> wrote:

> On closer inspection, I see that of my two tasks: the first processes 1
> input record and the other processes 0 input records.  So I think this
> solution is correct.  But perhaps it is not the most direct way to get the
> job done?
>
>
>
>
> From:        Mike Spreitzer/Watson/IBM@IBMUS
> To:        user@hadoop.apache.org,
> Date:        02/28/2013 04:18 PM
> Subject:        How to make a MapReduce job with no input?
> ------------------------------
>
>
>
> I am using the mapred API of Hadoop 1.0.  I want to make a job that does
> not really depend on any input (the job conf supplies all the info needed
> in Mapper).  What is a good way to do this?
>
> What I have done so far is write a job in which MyMapper.configure(..)
> reads all the real input from the JobConf, and MyMapper.map(..) ignores the
> given key and value, writing the output implied by the JobConf.  I set the
> InputFormat to TextInputFormat and the input paths to be a list of one
> filename; the named file contains one line of text (the word "one"),
> terminated by a newline.  When I run this job (on Linux, hadoop-1.0.0), I
> find it has two map tasks --- one reads the first two bytes of my non-input
> file, and other reads the last two bytes of my non-input file!  How can I
> make a job with just one map task?
>
> Thanks,
> Mike
>

--485b397dd78df19b8604d6d31a32
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Mike,<div><br></div><div>To do this for the more general case of creating N=
 map jobs with each job receiving the one record &lt;i, n&gt;, where i rang=
es from 0 to n-1, I wrote an=A0InputFormat, InputSplit, and RecordReader Ha=
doop class. The sample code is <a href=3D"http://goo.gl/npKfP">here</a>. I =
think I wrote those for Hadoop 0.19, so they may need some tweaking for sub=
sequent versions.</div>

<div><br></div><div>Jeff</div><div><br><div class=3D"gmail_quote">On Thu, F=
eb 28, 2013 at 4:25 PM, Mike Spreitzer <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:mspreitz@us.ibm.com" target=3D"_blank">mspreitz@us.ibm.com</a>&gt;</spa=
n> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><font face=3D"sans-serif">On closer inspecti=
on, I see that of my
two tasks: the first processes 1 input record and the other processes 0
input records. =A0So I think this solution is correct. =A0But perhaps
it is not the most direct way to get the job done?</font>
<br>
<br>
<br>
<br>
<br><font size=3D"1" color=3D"#5f5f5f" face=3D"sans-serif">From: =A0 =A0 =
=A0
=A0</font><font size=3D"1" face=3D"sans-serif">Mike Spreitzer/Watson/IBM@IB=
MUS</font>
<br><font size=3D"1" color=3D"#5f5f5f" face=3D"sans-serif">To: =A0 =A0 =A0
=A0</font><font size=3D"1" face=3D"sans-serif"><a href=3D"mailto:user@hadoo=
p.apache.org" target=3D"_blank">user@hadoop.apache.org</a>,
</font>
<br><font size=3D"1" color=3D"#5f5f5f" face=3D"sans-serif">Date: =A0 =A0 =
=A0
=A0</font><font size=3D"1" face=3D"sans-serif">02/28/2013 04:18 PM</font>
<br><font size=3D"1" color=3D"#5f5f5f" face=3D"sans-serif">Subject: =A0 =A0
=A0 =A0</font><font size=3D"1" face=3D"sans-serif">How to make
a MapReduce job with no input?</font>
<br>
<hr noshade><div class=3D"HOEnZb"><div class=3D"h5">
<br>
<br>
<br><font face=3D"sans-serif">I am using the mapred API of Hadoop
1.0. =A0I want to make a job that does not really depend on any input
(the job conf supplies all the info needed in Mapper). =A0What is a
good way to do this?</font><font size=3D"3"> <br>
</font><font face=3D"sans-serif"><br>
What I have done so far is write a job in which MyMapper.configure(..)
reads all the real input from the JobConf, and MyMapper.map(..) ignores
the given key and value, writing the output implied by the JobConf. =A0I
set the InputFormat to TextInputFormat and the input paths to be a list
of one filename; the named file contains one line of text (the word &quot;o=
ne&quot;),
terminated by a newline. =A0When I run this job (on Linux, hadoop-1.0.0),
I find it has two map tasks --- one reads the first two bytes of my non-inp=
ut
file, and other reads the last two bytes of my non-input file! =A0How
can I make a job with just one map task?</font><font size=3D"3"> <br>
</font><font face=3D"sans-serif"><br>
Thanks,</font><font size=3D"3"> </font><font face=3D"sans-serif"><br>
Mike</font>
<br></div></div></blockquote></div><br></div>

--485b397dd78df19b8604d6d31a32--