hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Running a single cluster in multiple datacenters
Date Mon, 15 Jul 2013 21:49:16 GMT

Last week we had a discussion at work regarding setting up our new Hadoop
One of the things that has changed is that the importance of the Hadoop
stack is growing so we want to be "more available".

One of the points we talked about was setting up the cluster in such a way
that the nodes are physically located in two separate datacenters (on
opposite sides of the same city) with a big network connection in between.
We're currently talking about a cluster in the 50 nodes range, but that
will grow over time.

The advantages I see:
- More CPU power available for jobs.
- The data is automatically copied between the datacenters as long as we
configure them to be different 'racks'.

The disadvantages I see:
- If the network goes out then one half is dead and the other half will
most likely go to safemode because the recovering of the missing replicas
will fill up the disks fast.

What things should we consider also?
Has anyone any experience with such a setup?
Is it a good idea to do this?
What are better options for us to consider?

Thanks for any input.
Best regards,

Niels Basjes

View raw message