hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: sharing variables across chained jobs
Date Wed, 23 Dec 2009 15:08:24 GMT
On Wed, Dec 23, 2009 at 6:55 AM, Jason Venner <jason.hadoop@gmail.com>wrote:

> If your jobs are launched by separate jvm instances, the only real
> persistence framework you have is hdfs.
> You have to basic choices:
>   1. Write a summary data to a persistent store, an hdfs file being a
>   simple case, that your next job reads
>   2. Write the data you need as a job counter, via the Reporter object, and
>   have the next job read the counters from the previous job via the
>   JobClient.getJob(jobid) interface.
> Case 2 requires that the counters still exist, they are usually discarded
> with in 24 hours, and that you can determine the jobid of the job you need
> to interrogate.

Or, with case 2, you can also get the counters from your "driver" class
that's submitting the code, and then dump them into the JobConf of the next
job in the chain.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message