Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BF86EE5BE for ; Wed, 5 Dec 2012 17:54:28 +0000 (UTC) Received: (qmail 94478 invoked by uid 500); 5 Dec 2012 17:54:24 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 94053 invoked by uid 500); 5 Dec 2012 17:54:23 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 94025 invoked by uid 99); 5 Dec 2012 17:54:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 17:54:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sigurd.spieckermann@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 17:54:14 +0000 Received: by mail-pb0-f48.google.com with SMTP id rq13so3754531pbb.35 for ; Wed, 05 Dec 2012 09:53:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=R2HD5EreMJ84AuPnDDGoSGqRt8cqF2zDIAKSlvc2eAc=; b=zj7mPVeyvxPD897izNMaXLAosiz2okkR+oZPLC25qqfWAeeE8F9s0dAk/4sUdTLj7u JmB3Fo/JeW1bDh0rgfxAnX6HszxZ/i40P/zcsuQYMhMqPN7k6Lq6IcgMLc+/BI0CHG7q gvtbmWvnzmg7ESbfzIy2aWPoGBnovBCL3DPPjXvIhWLLKSnXCwM4Gj5kCtNvyWMucTuS SPoL3m085bIyi+bVLnr3rj6pyF1/YSPq7uuzKC1sDJ5SLASOpDZ1ESaJK4C/612qSJ2f apN/8M1bHIRWPVnuqoCw/3RxLeT1v8rPVsVkK+kJ91h9/9E8B6sL9/V15dA15Si04uND zdCA== MIME-Version: 1.0 Received: by 10.68.129.227 with SMTP id nz3mr50757963pbb.111.1354730033674; Wed, 05 Dec 2012 09:53:53 -0800 (PST) Received: by 10.68.41.35 with HTTP; Wed, 5 Dec 2012 09:53:53 -0800 (PST) Received: by 10.68.41.35 with HTTP; Wed, 5 Dec 2012 09:53:53 -0800 (PST) Date: Wed, 5 Dec 2012 18:53:53 +0100 Message-ID: Subject: Tell Hadoop to store pairs of files at the same location(s) on HDFS From: Sigurd Spieckermann To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b111a436c7ad504d01eaa76 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b111a436c7ad504d01eaa76 Content-Type: text/plain; charset=ISO-8859-1 Hi guys, I have been wondering if there's a way (hack'ish would be okay too) to tell Hadoop that two files shall be stored together at the same location(s). It would benefit map-side join performance if it could be done somehow because all map tasks would be able to read data from a local copy. Does anyone know a way? -Sigurd --047d7b111a436c7ad504d01eaa76 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hi guys,

I have been wondering if there's a way (hack'ish wou= ld be okay too) to tell Hadoop that two files shall be stored together at t= he same location(s). It would benefit map-side join performance if it could= be done somehow because all map tasks would be able to read data from a lo= cal copy. Does anyone know a way?

-Sigurd

--047d7b111a436c7ad504d01eaa76--