hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kostyrka <andr...@kostyrka.org>
Subject Re: access jobconf in streaming job
Date Fri, 08 Aug 2008 20:07:16 GMT
On Friday 08 August 2008 11:43:50 Rong-en Fan wrote:
> After looking into streaming source, the answer is via environment
> variables. For example, mapred.task.timeout is in
> the mapred_task_timeout environment variable.

Well, another typical way to deal with that is to pass the parameters via 
cmdline.

I personally ended up stuffing all our configuration that is related to the 
environment into a config.ini file, that gets served via http, and I pass 
a -c http://host:port/config.ini parameter to all the jobs.

Configuration related to what I expect the job to do I still keep on the 
cmdline, e.g. the hadoop call looks something like this:

time $HADOOP_HOME/bin/hadoop jar 
$HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar \
       -mapper "/home/hadoop/bin/llfp --s3fetch -K $AWS_ACCESS_KEY_ID -S 
$AWS_SECRET_ACCESS_KEY --stderr -d vmregistry -d frontpage -d papi2 -d 
gen_dailysites -d fb_memberfind -c $CONFIGURL " \
       -reducer "/home/hadoop/bin/lrp --stderr -c $CONFIGURL" \
       -jobconf mapred.reduce.tasks=22 \
       -input /user/hadoop/run-$JOBNAME-input -output /user/hadoop/run-$JOBNAME-output ||

exit 1

In our case the seperate .ini file makes sense because it describes the 
environment (e.g. http service urls, sql database connections, and so on) and 
is being used by other scripts that are not run inside hadoop.

Andreas

>
> On Fri, Aug 8, 2008 at 4:26 PM, Rong-en Fan <grafan@gmail.com> wrote:
> > I'm using streaming with a mapper written in perl. However, an
> > issue is that I want to pass some arguments via command line.
> > In regular Java mapper, I can access JobConf in Mapper.
> > Is there a way to do this?
> >
> > Thanks,
> > Rong-En Fan



Mime
View raw message