hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: sharing variables across chained jobs
Date Wed, 23 Dec 2009 14:55:53 GMT
If your jobs are launched by separate jvm instances, the only real
persistence framework you have is hdfs.

You have to basic choices:

   1. Write a summary data to a persistent store, an hdfs file being a
   simple case, that your next job reads
   2. Write the data you need as a job counter, via the Reporter object, and
   have the next job read the counters from the previous job via the
   JobClient.getJob(jobid) interface.


Case 2 requires that the counters still exist, they are usually discarded
with in 24 hours, and that you can determine the jobid of the job you need
to interrogate.

On Tue, Dec 22, 2009 at 11:51 PM, Himanshu <ll_oz_ll@yahoo.com.hk> wrote:

> Hi everyone,
> I run multiple map/reduce jobs which are
> chained together. The output of one map/reduce is the input of another.
> There are also some integer valued variables which are outputted from
> one map/reduce job and used as input in the consequent one. These
> variables are got by summing up integer valued data having a certain
> key in the reduce step.
>
> My question was - what would be the
> best way to share these several integer valued variables across
> multiply map/reduce iterations? I don't want to write each of them to a
> separate file in the reduce step and consequently read those files in
> the next iteration of map/reduce.
>
> Looking forward to all the suggestions
>
> H
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message