Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 34974 invoked from network); 26 Jul 2010 04:22:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Jul 2010 04:22:33 -0000 Received: (qmail 64016 invoked by uid 500); 26 Jul 2010 04:22:30 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 63017 invoked by uid 500); 26 Jul 2010 04:22:27 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 63005 invoked by uid 99); 26 Jul 2010 04:22:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jul 2010 04:22:25 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yhemanth@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jul 2010 04:22:19 +0000 Received: by pwj2 with SMTP id 2so8577016pwj.35 for ; Sun, 25 Jul 2010 21:21:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=PAOj4Vw+AVgE2fv4EjBo4kIOI7891ITnbACkt2TtRzk=; b=q8jaSHytqo2cwXg8HSRnynMKya/xBcVs2mCNHkDSa6kxVQvP1jCg23b8l3ekCnrASO Id7++olP5L7Yl7bOItPxeRsa7/x33+jNDjt0xSI1TfOJKH5zfu/1MwlYkVl0GVEhC/J8 KBKMsbz72GUm+ZM/ywHKFR18SOWVyBv9NpXlY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=XSHeC5gsdYjn0kFp2E9sJJZ35A2o5778+2HFUyrOuSzvqLuR7zcdvAZEHgjAsUKBd9 DjX4fz/wPJJ4KSC5XffLtju5kUJPTEjV0uR4KQVExvJzx5TYPUw5/7DgtcWSXyzG/LAn KnmyN+KEqsRAbqXGwkcL+r8A60LjU3W78JFzU= MIME-Version: 1.0 Received: by 10.142.164.14 with SMTP id m14mr8318942wfe.148.1280118118920; Sun, 25 Jul 2010 21:21:58 -0700 (PDT) Received: by 10.143.162.6 with HTTP; Sun, 25 Jul 2010 21:21:58 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Jul 2010 09:51:58 +0530 Message-ID: Subject: Re: What's speculative tasks From: Hemanth Yamijala To: mapreduce-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Pedro, On Sun, Jul 25, 2010 at 7:33 PM, Pedro Costa wrote: > Hi, > > - In hadoop MR it's used the term speculative tasks. What is speculative > tasks? > When the MR framework detects that some tasks are running slower than others in the job, it has an option to launch duplicates of those tasks on different nodes from the original ones, with the hope that they would complete sooner than the original slow tasks. The motivation for this feature is that it has been found that every job has 'stragglers' - a small percentage of tasks that are significantly slower than the rest of them and these slow down the overall execution time of the job. Typically these stragglers come around due to bad hardware. > - During the execution of a MR test, when we don't have splits to attribute > to reduce tasks, those reduce tasks will run? For example, if I set that > will run 6 reduce tasks and I don't have splits during the running of the > example, the reduce tasks will run? If so, where is verified that a reduce > task has a split assigned? Splits are related to map tasks, not reduce tasks. Reduce tasks get their input from the output of map tasks that is generated and stored in an intermediate fashion on the compute nodes. Can you clarify what you are looking for, with this context ? Thanks Hemanth