hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fab wol <darkwoll...@gmail.com>
Subject Re: Multi-Cluster Setup
Date Fri, 04 Jul 2014 12:10:52 GMT
hey Rahul,

thanks for pointing me to that page. It's definately worth a read. Need
both clusters to be at least V2.3 for that?

I was digging also a little bit further. There is the property setting
fs.defaultFS whchi might be the exact setting I was actually looking for.
Unfortuantely MapR restricts access to the CLDB and not directly to the
Namenode, which makes this command right now useless (we have a lot of data
in a MapR Cluster, but want to access it in another way) for us.

Thanks everyone, who helped here.

Cheers
Wolli


2014-07-03 18:33 GMT+02:00 Rahul Chaudhari <rahulchaudhari0405@gmail.com>:

> Fabian,
>    I see this as the classic case of federation of hadoop clusters. The MR
> or job can refer to the specific hdfs://<file location> as input but at the
> same time run on another cluster.
> You can refer to following link for further details on federation.
>
>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html
>
> Regards,
> Rahul Chaudhari
>
>
> On Thu, Jul 3, 2014 at 9:06 PM, fab wol <darkwolli32@gmail.com> wrote:
>
>> Hey Nitin,
>>
>> I'm not talking about concept-wise. I'm takling about how to actually do
>> it technically and how to set it up. Imagine this: I have two clusters,
>> both running fine and they are both (setup-wise) the same, besides that one
>> has way more tasktrackers/Nodemanagers than the other one. Now I want to
>> incorporate some data from the small cluster in the analysis of the big
>> cluster. How could i access the data natively (Just giving the input job
>> another HDFS folder)? In MapR I configure the specified file and then i
>> have another folder in the MapRFS with all the content from the other
>> cluster ... Could i somehow specify one Namenode to lookup another Namenode
>> and incorporate all the uncommon files?
>>
>> Cheers
>> Fabian
>>
>>
>> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <nitinpawar432@gmail.com>:
>>
>> Nothing is stopping you to implement cluster the way you want.
>>> You can have storage only nodes for your HDFS and do not run
>>> tasktrackers on them.
>>>
>>> Start bunch of machines with High RAM and high CPUs but no storage.
>>>
>>> Only thing to worry then would be network bandwidth to carry data from
>>> hdfs to tasks and back to hdfs.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <darkwolli32@gmail.com> wrote:
>>>
>>>> hey everyone,
>>>>
>>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>>> compute only cluster without much storage capabilities) another cluster's
>>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>>> If this is not possible, has anyone issued this as a Ticket or
>>>> something?`Ticket Number forwarding is also appreciated ...
>>>>
>>>> Cheers
>>>> Wolli
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Regards,
> Rahul Chaudhari
>

Mime
View raw message