Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 194C510B85 for ; Wed, 8 May 2013 22:00:47 +0000 (UTC) Received: (qmail 3250 invoked by uid 500); 8 May 2013 22:00:41 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 3136 invoked by uid 500); 8 May 2013 22:00:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 3000 invoked by uid 99); 8 May 2013 22:00:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 May 2013 22:00:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vinodkv@hortonworks.com designates 209.85.220.51 as permitted sender) Received: from [209.85.220.51] (HELO mail-pa0-f51.google.com) (209.85.220.51) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 May 2013 22:00:36 +0000 Received: by mail-pa0-f51.google.com with SMTP id ld10so1644342pab.10 for ; Wed, 08 May 2013 15:00:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:mime-version:content-type:subject:date:in-reply-to :to:references:message-id:x-mailer:x-gm-message-state; bh=gosMgt13+dkiawEDtOL/ImASs9aaBowZ77qPlLkE3NU=; b=gb180KI1JCwgZ0T21bo0AYNRT5f+eGu+vzjr/vT4Hx5TIAp0+S72SAlH15nl+JGt2n vVP61WfVTlkUkQ7afwFWK5GhzVbmEu0b1karWekJ58twnwXVDccJaPJ/UBzXIovlXeFZ N2rAT5WQvetLKPvZ8nI5s/YPlrJG2VJHJC44ajBzaQ3HmxfjEDy1Qw2u2tHZogi3geAL YzBLn75UqKykzL+/nDIvj76Brez7KWRAttNYoJRsK/pWHLsAI52T3tSLR5N68YC8E2/w rQPVn7UDrwKFfIYnoEhcX36NGy19Zr5c0YHKDJ1lixb1xMXqS+tehuCU9Pcg3xzr39eD dLCw== X-Received: by 10.66.197.165 with SMTP id iv5mr10348028pac.67.1368050415784; Wed, 08 May 2013 15:00:15 -0700 (PDT) Received: from [10.11.2.118] (host1.hortonworks.com. [70.35.59.2]) by mx.google.com with ESMTPSA id vu10sm302569pbc.27.2013.05.08.15.00.14 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 08 May 2013 15:00:14 -0700 (PDT) From: Vinod Kumar Vavilapalli Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/alternative; boundary="Apple-Mail=_3FB34611-57C2-4350-8B16-2296B3766253" Subject: Re: MapReduce - FileInputFormat and Locality Date: Wed, 8 May 2013 15:00:13 -0700 In-Reply-To: <518A5F6D.30700@etinternational.com> To: user@hadoop.apache.org References: <518A5F6D.30700@etinternational.com> Message-Id: <9AA3D4E4-CDCB-4BDC-94F2-77D8CB3FD0CE@apache.org> X-Mailer: Apple Mail (2.1283) X-Gm-Message-State: ALoCoQk+ExSOe59W91r956Hiq54EJVeGnqeC/aLih5GjkycwSepLJjN22zCnFM1wtFyPFEw/xlc3 X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_3FB34611-57C2-4350-8B16-2296B3766253 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 I think you misread it. If a given split has only one block, it uses all the locations of that = block. If it so happens that a given split has multiple blocks, it uses all the = locations of the first block. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote: > All, >=20 > I'm trying to understand how the current FileInputFormat implements = locality. As far as I can tell, it calculates splits using getSplit and = each split will contain the node that hosts the first block of data in = that split. Is my understanding correct? >=20 > Looking at the FileInputFormat for the old API (mapred), it appears = that it does more to implement locality, using getSplitHosts to "return = the hosts that contribute most for a given split" >=20 > If I understand correctly, why was this changed? >=20 > Thanks, > Brian >=20 --Apple-Mail=_3FB34611-57C2-4350-8B16-2296B3766253 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
HTH,
+Vinod Kumar Vavilapalli
Hortonworks = Inc.
http://hortonworks.com/


On May 8, 2013, at 7:21 AM, Brian C. Huffman = wrote:

All,

I'm trying to understand how the current = FileInputFormat implements locality.  As far as I can tell, it = calculates splits using getSplit and each split will contain the node = that hosts the first block of data in that split.  Is my = understanding correct?

Looking at the FileInputFormat for the old = API (mapred), it appears that it does more to implement locality, using = getSplitHosts to "return the hosts that contribute most for a given = split"

If I understand correctly, why was this = changed?

Thanks,
Brian


= --Apple-Mail=_3FB34611-57C2-4350-8B16-2296B3766253--