Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 94130 invoked from network); 4 Jun 2009 01:36:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jun 2009 01:36:11 -0000 Received: (qmail 1748 invoked by uid 500); 4 Jun 2009 01:36:21 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 1660 invoked by uid 500); 4 Jun 2009 01:36:21 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 1650 invoked by uid 99); 4 Jun 2009 01:36:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2009 01:36:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.136.44.52] (HELO n56.bullet.mail.sp1.yahoo.com) (98.136.44.52) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 04 Jun 2009 01:36:09 +0000 Received: from [216.252.122.217] by n56.bullet.mail.sp1.yahoo.com with NNFMP; 04 Jun 2009 01:35:48 -0000 Received: from [68.142.237.89] by t2.bullet.sp1.yahoo.com with NNFMP; 04 Jun 2009 01:35:47 -0000 Received: from [67.195.9.81] by t5.bullet.re3.yahoo.com with NNFMP; 04 Jun 2009 01:35:47 -0000 Received: from [67.195.9.103] by t1.bullet.mail.gq1.yahoo.com with NNFMP; 04 Jun 2009 01:35:47 -0000 Received: from [127.0.0.1] by omp107.mail.gq1.yahoo.com with NNFMP; 04 Jun 2009 01:35:13 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 619453.92432.bm@omp107.mail.gq1.yahoo.com Received: (qmail 39495 invoked by uid 60001); 4 Jun 2009 01:35:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1244079347; bh=YKtX2aqp7DpIoM40Dgv418ZKHrtuHNEB+wcbLuuNYLQ=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=3C5gb0sT7t7EP3fTPRvyyLJ/ILPAU+xYMbb8C/xnwkqyNoFE7y0KNvxsB8MOScQ008KFV8PGZskEK7rLiJUdXOSQ9ASSsaoFnwLGXBK7Rqple3HbcQxCE/JH773+wXJzx2vFUJYgKr786EMBI8o39wC+TyOt+1J0PaDDpSN9gw8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=Ety5XSglOhCoG2v8jRqHIP/I0u2PPE4EAoD5r1X1oFYmgs9SczZ4x+eIPXPMT9MnothG0x/Pu6JjqkjiJyC9BvpFy79k26FVzyk9Wg9RiaDLnHSp6ZFh5Th84fSjB9iK0Eup2VIO3hO121uJbF56HaUElXW+97zIqJLEbat1skY=; Message-ID: <59617.39491.qm@web111015.mail.gq1.yahoo.com> X-YMail-OSG: SekUr6gVM1mdjZXGID6jl77ukcCZol87vbYSmoAi1itr1p65NwmxwiPU2sTeNwD_RY1OWOxSfvGt9XgDpAZORo50ZxlgvCyRjsbzooQNRjjSxgziYRqzoJhMnq8yFQR90YNjBrpnUCpKFb254CEowlcvodGgEnij10AQ4sjas7J5znwlySGOTScKxN8Y03eMKLwJScxfpEDUSdKH43jTT4yiwsUG.6g14te4ETJQjzwh42ESc7J4V_G0lQ1XcLewSfBaJwu3dtQnf0jvR7gMlUis3APAC1DybccaEW2XeMRGb2f2ykpYxhk- Received: from [216.145.54.15] by web111015.mail.gq1.yahoo.com via HTTP; Wed, 03 Jun 2009 18:35:46 PDT X-Mailer: YahooMailRC/1277.43 YahooMailWebService/0.7.289.10 References: <192240.60586.qm@web111016.mail.gq1.yahoo.com> <45f85f70906010715l328b3ee0t7285a2b2a3f315b8@mail.gmail.com> <4A23E632.9080409@apache.org> <54895.9771.qm@web111009.mail.gq1.yahoo.com> <45f85f70906012236x7f5f13c4q8fdc42a4855103f8@mail.gmail.com> <16013.90577.qm@web111008.mail.gq1.yahoo.com> <49efc3330906020122r3b2cad89ta04c90e4f88b7afd@mail.gmail.com> Date: Wed, 3 Jun 2009 18:35:46 -0700 (PDT) From: Jianmin Woo Subject: Re: question about when shuffle/sort start working To: core-user@hadoop.apache.org In-Reply-To: <49efc3330906020122r3b2cad89ta04c90e4f88b7afd@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1169269162-1244079346=:39491" X-Virus-Checked: Checked by ClamAV on apache.org --0-1169269162-1244079346=:39491 Content-Type: text/plain; charset=us-ascii Thanks a lot for your suggestions on the interplay between job and the driver, Chuck. Yes, the job may hold some , say, training data, which is needed in each round of the job. I will check the link you provided. Actually, I am thinking some really light-weight map/reduce jobs. For example, each job consist just summing up an integer or array of integers sent from the map tasks. There maybe huge amount of this kind of jobs in the algorithm, so the efficiency is critical. Thanks, Jianmin ________________________________ From: Chuck Lam To: core-user@hadoop.apache.org Sent: Tuesday, June 2, 2009 4:22:46 PM Subject: Re: question about when shuffle/sort start working I'm pretty sure JVM reuse is only for tasks within the same job. You can't persist data from one job to the next this way. I don't think it's even possible to guarantee the same set of nodes to be running one job as the next job. Let me see if I understand your situation correctly. You're implementing some iterative algorithm where each iteration can be neatly implemented as a MR job. It iterates until some convergence criterion is met (or some max number of iterations, to avoid infinite loop). The challenge is that it's not clear how a convergence criterion can be communicated globally to the driver of this job chain, for the driver to decide when to stop. I haven't tried this before, but if the convergence criterion is just a single value, one hack your job can do is output the convergence criterion as a Counter thru the Reporter object. At the end of each job (iteration), the counter is checked to see if the convergence criterion meets the stopping requirements. It launches the next job only if the stopping requirement is not met. Counters are automatically summed across all tasks. This should be fine if your convergence criterion is something like a sum of residuals. Counters are integers and most convergence criteria are floating point, so you'll have to scale your numbers and round them to integers to approximate things. (Like I said, it's a bit of a hack.) If you have to persist a lot of data from one job to the next, Jimmy Lin et al has tried adding a memcached system to Hadoop to do that. www.*umiacs.umd.edu*/~*jimmylin*/publications/*Lin*_etal_TR2009.pdf It's a pretty heavy set up tho, but may be useful for some applications. On Mon, Jun 1, 2009 at 11:31 PM, Jianmin Woo wrote: > Thanks for your information, Todd. > > It's cool to address the job overhead in JVM reuse way. You mentioned that > the code(and variables?) are kept in JIT cache, so can we refer the > variables in the last iteration in some way? I am really caurious how this > works in coding level. Could you please help to point out some > material/resources I can start to get to know the usage of this feature? Is > this implemented in the current 0.20 version? I am pretty fresh to haoop. > :-) > > Thanks, > Jianmin > > > > > ________________________________ > From: Todd Lipcon > To: core-user@hadoop.apache.org > Sent: Tuesday, June 2, 2009 1:36:17 PM > Subject: Re: question about when shuffle/sort start working > > On Mon, Jun 1, 2009 at 7:49 PM, Jianmin Woo wrote: > > > > > In this way, if some job fails, we can re-run the whole job sequence > > starting from the latest checkpoint instead of the beginning. It will be > > nice if all the sequence (it`s actually a loop which happens in many of > the > > machine learning algorithms, each loop contains a Map and Reduce step. > > PageRank calculation is one of the examples) can be done in a single job. > > Because if an algorithm takes 1000 steps to converge, we have to start > 1000 > > jobs in the job sequence way, which is costly since of the start/stop of > > jobs. > > > Yep - as you mentioned this is very common in many algorithms on MapReduce, > especially in machine learning and graph processing. The only way to do > this > is, like you said, run 1000 jobs. > > Job overhead is one thing that has been improved significantly in recent > versions. JVM reuse is one such new-ish feature that helps here, since the > overhead of spawning tasktracker children is eliminated, and your code > stays > in the JIT cache between iterations. > > -Todd > > > > > --0-1169269162-1244079346=:39491--