hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Chaudhari <rahulchaudhari0...@gmail.com>
Subject Re: Multi-Cluster Setup
Date Thu, 03 Jul 2014 16:33:20 GMT
Fabian,
   I see this as the classic case of federation of hadoop clusters. The MR
or job can refer to the specific hdfs://<file location> as input but at the
same time run on another cluster.
You can refer to following link for further details on federation.

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html

Regards,
Rahul Chaudhari


On Thu, Jul 3, 2014 at 9:06 PM, fab wol <darkwolli32@gmail.com> wrote:

> Hey Nitin,
>
> I'm not talking about concept-wise. I'm takling about how to actually do
> it technically and how to set it up. Imagine this: I have two clusters,
> both running fine and they are both (setup-wise) the same, besides that one
> has way more tasktrackers/Nodemanagers than the other one. Now I want to
> incorporate some data from the small cluster in the analysis of the big
> cluster. How could i access the data natively (Just giving the input job
> another HDFS folder)? In MapR I configure the specified file and then i
> have another folder in the MapRFS with all the content from the other
> cluster ... Could i somehow specify one Namenode to lookup another Namenode
> and incorporate all the uncommon files?
>
> Cheers
> Fabian
>
>
> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <nitinpawar432@gmail.com>:
>
> Nothing is stopping you to implement cluster the way you want.
>> You can have storage only nodes for your HDFS and do not run tasktrackers
>> on them.
>>
>> Start bunch of machines with High RAM and high CPUs but no storage.
>>
>> Only thing to worry then would be network bandwidth to carry data from
>> hdfs to tasks and back to hdfs.
>>
>>
>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <darkwolli32@gmail.com> wrote:
>>
>>> hey everyone,
>>>
>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>> compute only cluster without much storage capabilities) another cluster's
>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>> If this is not possible, has anyone issued this as a Ticket or
>>> something?`Ticket Number forwarding is also appreciated ...
>>>
>>> Cheers
>>> Wolli
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Regards,
Rahul Chaudhari

Mime
View raw message