Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 14760 invoked from network); 17 Apr 2011 23:57:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Apr 2011 23:57:54 -0000 Received: (qmail 59663 invoked by uid 500); 17 Apr 2011 23:57:51 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 59611 invoked by uid 500); 17 Apr 2011 23:57:51 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 59603 invoked by uid 99); 17 Apr 2011 23:57:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Apr 2011 23:57:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of goksron@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Apr 2011 23:57:47 +0000 Received: by qwj9 with SMTP id 9so3259000qwj.35 for ; Sun, 17 Apr 2011 16:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=Fwi8+s7UUrBLZ7kXn4FcP9KwSc30XXiAxSxDvTXKLrE=; b=j5nphZsecKR0YzVrEFpR/twS/MpfxKLsISQRCnHJMzbq4SYOQsFnxwLEHuZbrXTiXP DG11/HynQC2Im9DdKbKy6rFuqQ2glLhzikjSjgx5pJ6AbIMZD1K9wpXh9UesyV5mnNBO nD7/3tqrU1QkmBL94DF2TVtA7GonkFrXW1llQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=CtPJU50F5Isw3t7Lki7DN4jQAdc5aKCLS4Xkw0CZqzMwd/2klzR8htCn71iQj1X0JW 8gVq7H0qpaIFvdATPtHTBpwNx0UxsNPCh424zC5Kgy+OnSrOMIeTOkkUksbU59F5Epjm AIHv1zZCMDm9jPED552v1FmRF3d/7xZ3g0uxk= MIME-Version: 1.0 Received: by 10.229.18.81 with SMTP id v17mr3126063qca.7.1303084645783; Sun, 17 Apr 2011 16:57:25 -0700 (PDT) Received: by 10.229.84.140 with HTTP; Sun, 17 Apr 2011 16:57:25 -0700 (PDT) In-Reply-To: <5612E552-2214-40A2-A54C-3C452A3891F5@yahoo-inc.com> References: <5612E552-2214-40A2-A54C-3C452A3891F5@yahoo-inc.com> Date: Sun, 17 Apr 2011 16:57:25 -0700 Message-ID: Subject: Re: Estimating Time required to compute M/Rjob From: Lance Norskog To: common-user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable ROC Convex Hull is an analysis technique for optimizing parameters for given outputs. For example, if a classification technique has tuning knobs, ROCCH will find the settings that give a desired failure rate. On Sun, Apr 17, 2011 at 12:07 PM, Matthew Foley wrote= : > Since general M/R jobs vary over a huge (Turing problem equivalent!) rang= e of behaviors, a more tractable problem might be to characterize the descr= iptive parameters needed to answer the question: "If the following problem = P runs in T0 amount of time on a certain benchmark platform B0, how long T1= will it take to run on a differently configured real-world platform B1 ?" > > Or are you only dealing with one particular M/R job? =C2=A0If so, the abo= ve is a good way to look at it: first identify the controlling parameters, = then analyze how they co-vary with execution time. =C2=A0Now you've reduced= it to a question that can be answered by a series of "make hypothesis" / "= do experiment" steps :-) =C2=A0Pick a parameter you think is a likely candi= date, and make a series of measurements of execution time for different val= ues of the parameter. =C2=A0Repeat until you've fully characterized the pro= blem space. > > Good luck, > --Matt > > On Apr 16, 2011, at 6:39 AM, Sonal Goyal wrote: > > What is your MR job doing? What is the amount of data it is processing? W= hat > kind of a cluster do you have? Would you be able to share some details ab= out > what you are trying to do? > > If you are looking for metrics, you could look at the Terasort run .. > > Thanks and Regards, > Sonal > Hadoop ETL and Data > Integration > Nube Technologies > > > > > > > > On Sat, Apr 16, 2011 at 3:31 PM, real great.. > wrote: > >> Hi, >> As a part of my final year BE final project I want to estimate the time >> required by a M/R job given an application and a base file system. >> Can you folks please help me by posting some thoughts on this issue or >> posting some links here. >> >> -- >> Regards, >> R.V. >> > > --=20 Lance Norskog goksron@gmail.com