Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D923411D1E for ; Thu, 3 Jul 2014 16:33:50 +0000 (UTC) Received: (qmail 94460 invoked by uid 500); 3 Jul 2014 16:33:46 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 94353 invoked by uid 500); 3 Jul 2014 16:33:46 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 94343 invoked by uid 99); 3 Jul 2014 16:33:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jul 2014 16:33:46 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rahulchaudhari0405@gmail.com designates 209.85.219.42 as permitted sender) Received: from [209.85.219.42] (HELO mail-oa0-f42.google.com) (209.85.219.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jul 2014 16:33:41 +0000 Received: by mail-oa0-f42.google.com with SMTP id eb12so505065oac.1 for ; Thu, 03 Jul 2014 09:33:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/QvmYMInqU1nAxFGQOgNkiIclyiCpmTKHyI42WUnoZs=; b=T7nJ1UqsgRhA4zHGYAgrUA/lJ/Ag9/s76XrRDODZkvYfE12L2vgIWmyFmePTEZ35m1 giXryAH3UgHx8n1hTWW1tdQQsLIH4Ifcrb81nANd3Ft0fDLRYpcGezNBxFTaxgPBI5PN BIxd+Yqh9/6gz9wq7VaT5zCGecPB4oil2HQUT1cNKfIdd6rXl4pFOyWCPGU59Z0lNACL IwLGScESD2SSdpw3vJWh+LPmvuwCR3TEeorZzjekKdAfEVxRhrac3c/wUSuEYcxP/P9c LHs7rd922F64+AhtTqGSWr+x8VKQM5K37V6ee7v79qjsOWWPsLg3iPmiBSTO2LM86fRu rpHA== MIME-Version: 1.0 X-Received: by 10.60.132.203 with SMTP id ow11mr6043397oeb.47.1404405201020; Thu, 03 Jul 2014 09:33:21 -0700 (PDT) Received: by 10.76.107.133 with HTTP; Thu, 3 Jul 2014 09:33:20 -0700 (PDT) In-Reply-To: References:

Date: Thu, 3 Jul 2014 22:03:20 +0530 Message-ID: Subject: Re: Multi-Cluster Setup From: Rahul Chaudhari To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b41cc7420dcae04fd4c91e7 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b41cc7420dcae04fd4c91e7 Content-Type: text/plain; charset=UTF-8 Fabian, I see this as the classic case of federation of hadoop clusters. The MR or job can refer to the specific hdfs:// as input but at the same time run on another cluster. You can refer to following link for further details on federation. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html Regards, Rahul Chaudhari On Thu, Jul 3, 2014 at 9:06 PM, fab wol wrote: > Hey Nitin, > > I'm not talking about concept-wise. I'm takling about how to actually do > it technically and how to set it up. Imagine this: I have two clusters, > both running fine and they are both (setup-wise) the same, besides that one > has way more tasktrackers/Nodemanagers than the other one. Now I want to > incorporate some data from the small cluster in the analysis of the big > cluster. How could i access the data natively (Just giving the input job > another HDFS folder)? In MapR I configure the specified file and then i > have another folder in the MapRFS with all the content from the other > cluster ... Could i somehow specify one Namenode to lookup another Namenode > and incorporate all the uncommon files? > > Cheers > Fabian > > > 2014-07-03 17:09 GMT+02:00 Nitin Pawar : > > Nothing is stopping you to implement cluster the way you want. >> You can have storage only nodes for your HDFS and do not run tasktrackers >> on them. >> >> Start bunch of machines with High RAM and high CPUs but no storage. >> >> Only thing to worry then would be network bandwidth to carry data from >> hdfs to tasks and back to hdfs. >> >> >> On Thu, Jul 3, 2014 at 8:29 PM, fab wol wrote: >> >>> hey everyone, >>> >>> MapR is offering the possibility to acces from one cluster (e.g. a >>> compute only cluster without much storage capabilities) another cluster's >>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). >>> In times of Hadoop-as-a-Service this becomes very interesting. Is this >>> somehow possible with the "normal" Hadoop Distributions possible (CDH and >>> HDP, I'm looking at you ;- ) ) or with even without this help from those >>> distributors? Any Hacks and Tricks or even specific Functions are welcome. >>> If this is not possible, has anyone issued this as a Ticket or >>> something?`Ticket Number forwarding is also appreciated ... >>> >>> Cheers >>> Wolli >>> >> >> >> >> -- >> Nitin Pawar >> > > -- Regards, Rahul Chaudhari --047d7b41cc7420dcae04fd4c91e7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Fabian,

=C2=A0=C2=A0 I see this as= the classic case of federation of hadoop clusters. The MR or job can refer= to the specific hdfs://<file location> as input but at the same time= run on another cluster.

You can refer to following link for further details on federation.
http://hadoop.apache.org/docs/r2.3.0/hadoop-proj= ect-dist/hadoop-hdfs/Federation.html

Regards,
Rahul Chaudhari

<= br>

On Thu, Jul 3, 2014 at 9:06 PM, fab wol <= span dir=3D"ltr"><darkwolli32@gmail.com> wrote:

Hey Nitin,

I'm not talking about concept-wise. I'm takling about how to act= ually do it technically and how to set it up. Imagine this: I have two clus= ters, both running fine and they are both (setup-wise) the same, besides th= at one has way more tasktrackers/Nodemanagers than the other one. Now I wan= t to incorporate some data from the small cluster in the analysis of the bi= g cluster. How could i access the data natively (Just giving the input job = another HDFS folder)? In MapR I configure the specified file and then i hav= e another folder in the MapRFS with all the content from the other cluster = ... Could i somehow specify one Namenode to lookup another Namenode and inc= orporate all the uncommon files?

Cheers
Fabian

2014-07-03 17:09 GMT+02:00 Nitin = Pawar <nitinpawar432@gmail.com>:

Nothing is stopping you to = implement cluster the way you want.=C2=A0
You can have storage only nod= es for your HDFS and do not run tasktrackers on them.=C2=A0

Start bunch of machines with High RAM and high CPUs but= no storage.=C2=A0

Only thing to worry then would be network bandwidth to = carry data from hdfs to tasks and back to hdfs.=C2=A0

On Thu, Jul 3, 2014 at = 8:29 PM, fab wol <darkwolli32@gmail.com> wrote:

hey everyone,

MapR is offering the possibility to acces from one cluster (e.g. a co= mpute only cluster without much storage capabilities) another cluster's= HDFS/MapRFS (see=C2=A0http://doc.mapr.com/display/MapR/mapr-cluster= s.conf). In times of Hadoop-as-a-Service this becomes very interesting.= Is this somehow possible with the "normal" Hadoop Distributions = possible (CDH and HDP, I'm looking at you ;- ) ) or with even without t= his help from those distributors? Any Hacks and Tricks or even specific Fun= ctions are welcome. If this is not possible, has anyone issued this as a Ti= cket or something?`Ticket Number forwarding is also appreciated ...

Cheers
Wolli

--
Nitin Pawar

--

Regard= s,
Rahul Chaudhari

--047d7b41cc7420dcae04fd4c91e7--