Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MailScanner-NULL-Check: 1275386617.53637@tfPlj9dxNcrkHzeTvUjaDQ
Message-ID: <4BFBA078.5000505@apache.org>
Date: Tue, 25 May 2010 11:03:36 +0100
From: Steve Loughran <stevel@apache.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100228)
MIME-Version: 1.0
To: general@hadoop.apache.org
Subject: Re: Active-Active Performance
References: <590CDE7A083C4142AF05E96F2FB543BC9733D2@r-exchange.cardlink.local>
 <AANLkTil1XC1QF5XeGhGK_ZsQzXHaG5tNcRrNt9S2okSW@mail.gmail.com>
 <590CDE7A083C4142AF05E96F2FB543BC9733F9@r-exchange.cardlink.local>
In-Reply-To: 
 <590CDE7A083C4142AF05E96F2FB543BC9733F9@r-exchange.cardlink.local>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Anthony Ikeda wrote:
> Thanks Hemanth,
> 
> In regards to different locations of the HADOOP home this is low
> priority more for testing not production. I was trying to install HADOOP
> for testing over 2 machines with only a Windows XP machine running
> Cygwin and a Mac running Darwin. Not a priority.

Things are much easier if
  -all your machines have the same OS, disk structure
  -you are running on linux
  -you use some CM tool to automate setup/deploy, pushing out of config 
files

Start now, start with VMWare or virtualbox images now, so you learn 
about management sooner rather than later

> In regards to my last question about operating in a detached fashion, we
> are trying to factor in what happens when the link between both sites is
> cut. Will both sites operate independently until the connection is
> re-established? Is there any particular setup required to ensure we can
> cover this scenario or is it an out-of-the-box feature?

HDFS and the MapReduce engine is designed to run on a single datacentre 
with high bandwidth, high reliability links, current releases assume the 
facility is secure and all users are trusted. The key SPOF, the 
Namenode, doesn't do failover, so when it goes down or the network 
partitions, all machines that cannot see the NN poll and spin until it 
comes back -which can take a while, unless you have a secondary namenode 
to keep the persistent files up to date.  the workers all assume that 
the hostname and IPAddr of the namenode doesn't change, and never reread 
their config. You could use DNS to do failover, but you have to tune the 
JVMs to not cache IP addresses for very long.

To do cross site stuff you'd need a separate HDFS filesystem per site, 
synchronisation of data now becomes a task for the higher level apps. I 
don't know what HBase, Cassandra or other column DB tools do here.


-steve