From user-return-3410-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Wed Dec 5 21:51:36 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F321E621 for ; Wed, 5 Dec 2012 21:51:36 +0000 (UTC) Received: (qmail 8936 invoked by uid 500); 5 Dec 2012 21:51:31 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 8815 invoked by uid 500); 5 Dec 2012 21:51:31 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 8808 invoked by uid 99); 5 Dec 2012 21:51:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 21:51:31 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mcsrivas@gmail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 21:51:23 +0000 Received: by mail-lb0-f176.google.com with SMTP id k6so5129632lbo.35 for ; Wed, 05 Dec 2012 13:51:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=SQ2YeVMbftylp9/PYqWHR9f1/7HWDBLKoY6KaqsSDA8=; b=jIQVoPv+992GrUt4LhJTbK5bWZ7CwVexIQZ/nmLsYEUF52O9BpUhTn7WjpLqAN6nzp emjh99dMc/DWaU29oIjZhgcRMrMYf76YmXSRGnw+FssQShWwot58GMKOgf/ymqZ3NGH6 eT0q3TmPByQMY7AaiFQCIBh2WI9XXYooZp0eAPEk1Ikc+XUsn9Q0t9UBnRh34LltMgwv 1p6uIx+PsXuLc7z2v1N8wmcvNvVOZHgYpbIIqem0HS00hOZLKZgRLZiorIzNRDlMkd6y kjl9TdxhU1GJqLj06pZnN66boAKAZ1URUfL+vYC3rFmxT0UUeZQ5Q45wjGolYBQXYk8z vGNQ== MIME-Version: 1.0 Received: by 10.112.30.200 with SMTP id u8mr7985769lbh.104.1354744262853; Wed, 05 Dec 2012 13:51:02 -0800 (PST) Received: by 10.114.70.74 with HTTP; Wed, 5 Dec 2012 13:51:02 -0800 (PST) In-Reply-To: <50BF9FD5.6000100@gmail.com> References: <50BF9FD5.6000100@gmail.com> Date: Wed, 5 Dec 2012 13:51:02 -0800 Message-ID: Subject: Re: Tell Hadoop to store pairs of files at the same location(s) on HDFS From: "M. C. Srivas" To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec52e60098c830104d021faf7 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec52e60098c830104d021faf7 Content-Type: text/plain; charset=ISO-8859-1 MapR does this already .. and well beyond just 2 files. One can arrange things so that a boatload of files have all their replicas also placed on the same set of nodes, ie, files A ... Z will have replica1 on node1, replica2 on node2, replica3 on node3. etc. (nodes 1. 2 and 3 are picked by the system based on utilization and node-fullness). On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann < sigurd.spieckermann@gmail.com> wrote: > Awesome! That's exactly what I'm looking for. Hadn't seen the JIRA. I hope > this is coming soon! > > Am 05.12.2012 18:58, schrieb Harsh J: > > You are probably talking of >> https://issues.apache.org/**jira/browse/HDFS-2576and similar JIRAs. >> This feature isn't available in HDFS yet, but may arrive soon. >> >> On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann >> wrote: >> >>> Hi guys, >>> >>> I have been wondering if there's a way (hack'ish would be okay too) to >>> tell >>> Hadoop that two files shall be stored together at the same location(s). >>> It >>> would benefit map-side join performance if it could be done somehow >>> because >>> all map tasks would be able to read data from a local copy. Does anyone >>> know >>> a way? >>> >>> -Sigurd >>> >> >> >> >> --bcaec52e60098c830104d021faf7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable MapR does this already .. and well beyond just 2 files. =A0One can arrange = things so that a boatload of files have all their replicas also placed on t= he same set of nodes, ie,=A0=A0files A ... Z will have replica1 on node1, r= eplica2 on node2, replica3 on node3. etc. =A0(nodes 1. 2 and 3 are picked b= y the system based on utilization and node-fullness).




On Wed, Dec 5, 2012 at 11:26 AM, Sigurd Spieckermann <sigurd.spieckermann@gmail.com> wrote:
Awesome! That's exactly what I'm loo= king for. Hadn't seen the JIRA. I hope this is coming soon!

Am 05.12.2012 18:58, schrieb Harsh J:

You are probably talking of
https://issues.apache.org/jira/browse/HDFS-2576 and similar J= IRAs.
This feature isn't available in HDFS yet, but may arrive soon.

On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
<sigu= rd.spieckermann@gmail.com> wrote:
Hi guys,

I have been wondering if there's a way (hack'ish would be okay too)= to tell
Hadoop that two files shall be stored together at the same location(s). It<= br> would benefit map-side join performance if it could be done somehow because=
all map tasks would be able to read data from a local copy. Does anyone kno= w
a way?

-Sigurd




--bcaec52e60098c830104d021faf7--