hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From praveenesh kumar <praveen...@gmail.com>
Subject Re: Calling one MR job within another MR job
Date Wed, 04 Apr 2012 12:41:05 GMT
Try looking into distributed cache.. may be it solves your problem ?

Regards,
Praveenesh

On Wed, Apr 4, 2012 at 6:01 PM, Ravi teja ch n v
<raviteja.chnv@huawei.com>wrote:

>  Hi Stuti,
>
>
>
> In that case, you can run the Job with dependent file (file2) first, then
> go for the job using file1.
>
> Then your second mapper can use the already processed output.
>
>
>
> I guess this will solve the problem u have mentioned.
>
>
>
> Thanks,
>
> Ravi Teja
>
>
>  ------------------------------
> *From:* Stuti Awasthi [stutiawasthi@hcl.com]
> *Sent:* 04 April 2012 17:25:02
>
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* RE: Calling one MR job within another MR job
>
>   Hi Ravi,
>
>
>
> There is no job dependency so I cannot use chaining MR or JobControl as
> you suggested.
>
> I have 2 relatively big files, I start processing with File1 as input to
> MR1 job , now this processing required to find the data from File2. One way
> to do is loop through File2 and get the data. Other way to pass File2 in
> MR2 job for parallel processing.
>
>
>
> Second option is making hinting me to call an MR2 job inside from MR1 job.
> I am sure this is the common problem that people usually face. What is the
> best way to resolve this  kind of issue.
>
>
>
> Thanks
>
>
>
> *From:* Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
> *Sent:* Wednesday, April 04, 2012 4:35 PM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* RE: Calling one MR job within another MR job
>
>
>
> Hi Stuti,
>
>
>
> If you are looking for MRjob2 to run after MRjob1, ie the job dependency,
>
> you can use JobControl API, where you can manage the dependencies.
>
>
>
> Calling another Job from a Mapper is not a good idea.
>
>
>
> Thanks,
>
> Ravi Teja
>
>
>  ------------------------------
>
> *From:* Stuti Awasthi [stutiawasthi@hcl.com]
> *Sent:* 04 April 2012 16:04:19
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Calling one MR job within another MR job
>
> Hi all,
>
>
>
> We have a usecase in which I start with first MR1 job with input file as
> File1.txt, and from this job, call another MR2 job with input as File2.txt
>
> So :
>
> MRjob1{
>
> Map(){
>
> MRJob2(File2.txt)
>
> }
>
> }
>
>
>
> MRJob2{
>
> Processing….
>
> }
>
>
>
> My queries are is this kind of approach is possible and how much are the
> implications from the performance perspective.
>
>
>
>
>
> Regards,
>
> *Stuti Awasthi*
>
> HCL Comnet Systems and Services Ltd
>
> F-8/9 Basement, Sec-3,Noida.
>
>
>
>
>  ------------------------------
>
> ::DISCLAIMER::
>
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect
> the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of
> this message without the prior written consent of the author of this
> e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender
> immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
>
> -----------------------------------------------------------------------------------------------------------------------
>

Mime
View raw message