Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17CA99BF0 for ; Fri, 18 Nov 2011 15:28:38 +0000 (UTC) Received: (qmail 79111 invoked by uid 500); 18 Nov 2011 15:28:37 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 79045 invoked by uid 500); 18 Nov 2011 15:28:36 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 79037 invoked by uid 99); 18 Nov 2011 15:28:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Nov 2011 15:28:36 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mailinglists19@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bw0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Nov 2011 15:28:30 +0000 Received: by bkbc12 with SMTP id c12so5117174bkb.35 for ; Fri, 18 Nov 2011 07:28:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=A0zzBMybHIQKstJV4fEBdU1FpRK7Q+cPZHprTMe5fV8=; b=Cd0guWnCtBGcdoEBkUeekDnnkFbFn4q/gAfTbKPVrQfQcqezZxZ5TRiQVpjSPC2SBp oMl0kiLFoK/ITwGskGtFAHoc4sAewB6dSCLhEHZ06VynAIHTdg9GVzAvMhkFlGx+k78v fHlEbo8MNUwbCt9FbJPTszuq13eA91LDVH3Q4= MIME-Version: 1.0 Received: by 10.205.127.68 with SMTP id gz4mr3868323bkc.17.1321630088605; Fri, 18 Nov 2011 07:28:08 -0800 (PST) Received: by 10.223.111.134 with HTTP; Fri, 18 Nov 2011 07:28:08 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Nov 2011 07:28:08 -0800 Message-ID: Subject: Re: Business logic in cleanup? From: Something Something To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0ce0acaaf4b20704b203fb3c --000e0ce0acaaf4b20704b203fb3c Content-Type: text/plain; charset=ISO-8859-1 Thanks again for the clarification. Not sure what you mean by it's not a 'stage'! Okay.. may be not a stage but I think of it as an 'Event', such as 'Mouseover', 'Mouseout'. The 'cleanup' is really a 'MapperCompleted' event, right? Confusion comes with the name of this method. The name 'cleanup' makes me think it should not be really used as 'mapperCompleted', but it appears there's no harm in using it that way. Here's our dilemma - when we use (local) caching in the Mapper & write in the 'cleanup', our job completes in 18 minutes. When we don't write in 'cleanup' it takes 3 hours!!! Knowing this if you were to decide, would you use 'cleanup' for this purpose? Thanks once again for your advice. On Thu, Nov 17, 2011 at 9:35 PM, Harsh J wrote: > Hello, > > On Fri, Nov 18, 2011 at 10:44 AM, Something Something > wrote: > > Thanks for the reply. Here's another concern we have. Let's say Mapper > has > > finished processing 1000 lines from the input file & then the machine > goes > > down. I believe Hadoop is smart enough to re-distribute the input split > > that was assigned to this Mapper, correct? After re-assigning will it > > reprocess the 1000 lines that were processed successfully before & start > > from line 1001 OR would it reprocess ALL lines? > > Attempts of any task start afresh. That's the default nature of Hadoop. > > So, it would begin from start again and hence reprocess ALL lines. > Understand that cleanup is just a fancy API call here, thats called > after the input reader completes - not a "stage". > > -- > Harsh J > --000e0ce0acaaf4b20704b203fb3c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks again for the clarification. =A0Not sure what you mean by it's n= ot a 'stage'! =A0Okay.. may be not a stage but I think of it as an = 'Event', such as 'Mouseover', 'Mouseout'. =A0The &#= 39;cleanup' is really a 'MapperCompleted' event, right?

Confusion comes with the name of this method. =A0The name 'cleanup&= #39; makes me think it should not be really used as 'mapperCompleted= 9;, but it appears there's no harm in using it that way.

Here= 9;s our dilemma - when we use (local) caching in the Mapper & write in = the 'cleanup', our job completes in 18 minutes. =A0When we don'= t write in 'cleanup' it takes 3 hours!!! =A0Knowing this if you wer= e to decide, would you use 'cleanup' for this purpose?

Thanks once again for your advice.



--000e0ce0acaaf4b20704b203fb3c--