Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Mon, 15 Jul 2013 23:56:06 +0200
From: jb@nanthrax.net
To: <user@hadoop.apache.org>
Subject: Re: Running a single cluster in multiple datacenters
In-Reply-To: 
 <CADoiZqo7Wtis6TSzJzdwPgTo7A5d47SVQpNmWMJNiJWqa+KWDA@mail.gmail.com>
References: 
 <CADoiZqo7Wtis6TSzJzdwPgTo7A5d47SVQpNmWMJNiJWqa+KWDA@mail.gmail.com>
Message-ID: <a8260c8f3d608042d4ada762e73a0050@nanthrax.net>
User-Agent: Roundcube Webmail/0.7.2

Hi Niels,

it's depend of the number of replicas and the Hadoop rack configuration 
(level).
It's possible to have replicas on the two datacenters.

What's the rack configuration that you plan ? You can implement your 
own one and define it using the topology.node.switch.mapping.impl 
property.

Regards
JB

On 2013-07-15 23:49, Niels Basjes wrote:
> Hi,
>
> Last week we had a discussion at work regarding setting up our new
> Hadoop cluster(s).
> One of the things that has changed is that the importance of the
> Hadoop stack is growing so we want to be "more available".
>
> One of the points we talked about was setting up the cluster in such 
> a
> way that the nodes are physically located in two separate datacenters
> (on opposite sides of the same city) with a big network connection in
> between.
>
> Were currently talking about a cluster in the 50 nodes range, but 
> that
> will grow over time.
>
> The advantages I see:
> - More CPU power available for jobs.
> - The data is automatically copied between the datacenters as long as
> we configure them to be different racks.
>
> The disadvantages I see:
> - If the network goes out then one half is dead and the other half
> will most likely go to safemode because the recovering of the missing
> replicas will fill up the disks fast.
>
> What things should we consider also?
> Has anyone any experience with such a setup?
> Is it a good idea to do this?
>
> What are better options for us to consider?
>
> Thanks for any input.