hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: doubt on Hadoop job submission process
Date Mon, 13 Aug 2012 15:23:44 GMT
Hi Manoj,

As I had said before, Hadoop will auto-find the jar from your runtime
classpath and use that. All you need to do is set the right class
(driver) to use via JobConf.setJarByClass(…).

On Mon, Aug 13, 2012 at 5:50 PM, Manoj Babu <manoj444@gmail.com> wrote:
> Then i need to submit the jar contains non hadoop activity classes and its
> supporting libraries to all the nodes since i can't create two jar's.
> Is there anyway to do it optimized?
>
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Aug 13, 2012 at 5:20 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Sure, you may separate the logic as you want it to be, but just ensure
>> the configuration object has a proper setJar or setJarByClass done on
>> it before you submit the job.
>>
>> On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu <manoj444@gmail.com> wrote:
>> > Hi Harsh,
>> >
>> > Thanks for your reply.
>> >
>> > Consider from my main program i am doing so many
>> > activities(Reading/writing/updating non hadoop activities) before
>> > invoking
>> > JobClient.runJob(conf);
>> > Is it anyway to separate the process flow by programmatic instead of
>> > going
>> > for any workflow engine?
>> >
>> > Cheers!
>> > Manoj.
>> >
>> >
>> >
>> > On Mon, Aug 13, 2012 at 4:10 PM, Harsh J <harsh@cloudera.com> wrote:
>> >>
>> >> Hi Manoj,
>> >>
>> >> Reply inline.
>> >>
>> >> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu <manoj444@gmail.com> wrote:
>> >> > Hi All,
>> >> >
>> >> > Normal Hadoop job submission process involves:
>> >> >
>> >> > Checking the input and output specifications of the job.
>> >> > Computing the InputSplits for the job.
>> >> > Setup the requisite accounting information for the DistributedCache
>> >> > of
>> >> > the
>> >> > job, if necessary.
>> >> > Copying the job's jar and configuration to the map-reduce system
>> >> > directory
>> >> > on the distributed file-system.
>> >> > Submitting the job to the JobTracker and optionally monitoring it's
>> >> > status.
>> >> >
>> >> > I have a doubt in 4th point of  job execution flow could any of you
>> >> > explain
>> >> > it?
>> >> >
>> >> > What is job's jar?
>> >>
>> >> The job.jar is the jar you supply via "hadoop jar <jar>". Technically
>> >> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
>> >> setJarByClass calls).
>> >>
>> >> > Is it job's jar is the one we submitted to hadoop or hadoop will
>> >> > build
>> >> > based
>> >> > on the job configuration object?
>> >>
>> >> It is the former, as explained above.
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Mime
View raw message