hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Business logic in cleanup?
Date Fri, 18 Nov 2011 18:46:24 GMT

On Nov 18, 2011, at 10:44 AM, Harsh J wrote:
> 
> If you could follow up on that patch, and see it through, its wish granted for a lot
of us as well, as we move ahead with the newer APIs in the future Hadoop releases ;-)
> 

The plan is to support both mapred and mapreduce MR apis for the forseeable future.

Arun

> On 18-Nov-2011, at 10:32 PM, Something Something wrote:
> 
>> Thanks again.  Will look at Mapper.run to understand better.  Actually, just a few
minutes ago I got the AVROMapper to work (which will read from AVRO files). This will hopefully
improve performance even more.
>> 
>> Interesting, AVROMapper doesn't extend from Mapper, so it doesn't have the 'cleanup'
method.  Instead it provides a 'close' method, which seems to behave the same way.  Honestly,
I like the method name 'close' better than 'cleanup'.
>> 
>> Doug - Is there a reason you chose to not extend from org/apache/hadoop/mapreduce/Mapper?
>> 
>> Thank you all for your help.
>> 
>> 
>> On Fri, Nov 18, 2011 at 7:44 AM, Harsh J <harsh@cloudera.com> wrote:
>> Given that you are sure about it, and you also know why thats the
>> case, I'd definitely write inside the cleanup(…) hook. No harm at all
>> in doing that.
>> 
>> Take a look at mapreduce.Mapper#run(…) method in source and you'll
>> understand what I mean by it not being a stage or even an event, but
>> just a tail call after all map()s are called.
>> 
>> On Fri, Nov 18, 2011 at 8:58 PM, Something Something
>> <mailinglists19@gmail.com> wrote:
>> > Thanks again for the clarification.  Not sure what you mean by it's not a
>> > 'stage'!  Okay.. may be not a stage but I think of it as an 'Event', such as
>> > 'Mouseover', 'Mouseout'.  The 'cleanup' is really a 'MapperCompleted' event,
>> > right?
>> >
>> > Confusion comes with the name of this method.  The name 'cleanup' makes me
>> > think it should not be really used as 'mapperCompleted', but it appears
>> > there's no harm in using it that way.
>> >
>> > Here's our dilemma - when we use (local) caching in the Mapper & write in
>> > the 'cleanup', our job completes in 18 minutes.  When we don't write in
>> > 'cleanup' it takes 3 hours!!!  Knowing this if you were to decide, would you
>> > use 'cleanup' for this purpose?
>> >
>> > Thanks once again for your advice.
>> >
>> >
>> > On Thu, Nov 17, 2011 at 9:35 PM, Harsh J <harsh@cloudera.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> On Fri, Nov 18, 2011 at 10:44 AM, Something Something
>> >> <mailinglists19@gmail.com> wrote:
>> >> > Thanks for the reply.  Here's another concern we have.  Let's say Mapper
>> >> > has
>> >> > finished processing 1000 lines from the input file & then the machine
>> >> > goes
>> >> > down.  I believe Hadoop is smart enough to re-distribute the input
split
>> >> > that was assigned to this Mapper, correct?  After re-assigning will
it
>> >> > reprocess the 1000 lines that were processed successfully before &
start
>> >> > from line 1001  OR  would it reprocess ALL lines?
>> >>
>> >> Attempts of any task start afresh. That's the default nature of Hadoop.
>> >>
>> >> So, it would begin from start again and hence reprocess ALL lines.
>> >> Understand that cleanup is just a fancy API call here, thats called
>> >> after the input reader completes - not a "stage".
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>> 
>> 
>> 
>> --
>> Harsh J
>> 
> 


Mime
View raw message