Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D4B19A9B for ; Sat, 8 Sep 2012 03:24:19 +0000 (UTC) Received: (qmail 63947 invoked by uid 500); 8 Sep 2012 03:24:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 63233 invoked by uid 500); 8 Sep 2012 03:24:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 63204 invoked by uid 99); 8 Sep 2012 03:24:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Sep 2012 03:24:05 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Sep 2012 03:23:57 +0000 Received: by obbtb18 with SMTP id tb18so556511obb.35 for ; Fri, 07 Sep 2012 20:23:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=x2kkpb6MOtv1W3OqPaCmfsFO18AGh363UXeIheED3Qs=; b=B1QrigkCp1HY+qocrFYnz29TtH5OMJvEmK/oSxjXfQ4ZnspIm9RaF+avsEavv61VN8 BHMmSrLp+xEfJcghi3v8XUKVkmyacQDRj8Get6GNCZmCojGkceKFhWkzVAWGI4ahI7uL 2zIb4vSlgWlPNLiglnV+wZxHxR1lGbKe8/T/IrqlUEk8dRZAo0ArXMr13xdh+0DR6YaI EfKIDI3hasLsEt476wAA97KnK3DHav6/iLy15xmd2STWxJF9frQw5n+JWfgv4dEO5VQt 4a1pzWyfvlHqg8DaKZ/raYsxb3iHWstkBkqEg7NXhWwyl5luwBqNPGt35wqeg9szdERU SAiQ== Received: by 10.182.46.65 with SMTP id t1mr8124854obm.20.1347074616870; Fri, 07 Sep 2012 20:23:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.11.168 with HTTP; Fri, 7 Sep 2012 20:23:16 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Sat, 8 Sep 2012 08:53:16 +0530 Message-ID: Subject: Re: Job Controller for MapReduce task assignment To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkKu+asDdNq+547ZLKiooAnA1kIfhTE4ZzUrpOncqIQH1kRc9CSSjo+CgzBM2HweiwBRn6s Hey John, Here's how MR works, to speak simply: - Job.submit() is called. - Job's InputFormat#getSplits() is called, its result serialized and shipped across, along with other job artifacts such as jars, etc., to the configured FS, for the JobTracker or the MR2 ApplicationMaster for use. - The splits info contains locality hints that the scheduler then uses to assign a host's slot or resources to, depending also on availability/requested resources (hence, a 'hint', not strict). The first two are client-end (controllable), the last is dependent on the scheduler you've put in use (Fifo/Capacity/Fair) or have implemented (Custom). I'm unclear on what exactly you ask, but I think you may want to start by reading the JobSubmitter class and go around from there. Does this help? On Fri, Sep 7, 2012 at 1:24 PM, John Cuffney wrote: > Hey, > > Which class handles the top level partitioning for MapReduce? It's possible > I have a misunderstanding of how this is handled, but in my view, there is a > top level controller which kicks off the whole process; it handles > partitioning of the input and distribution of the input segments to the > various machines/tasks. I have been searching through a lot of the Job > classes, and they all seem to handle a single task, whereas it is important > for me to perform some work at the highest level controller, if that exists. > Any info on what I'm looking for/if I'm on the wrong track would be much > appreciated. > > Thanks for the help, > John -- Harsh J