hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Multiple data centre in Hadoop
Date Mon, 16 Apr 2012 14:08:59 GMT
Hi Abhishek,

Manu is correct about High Availability within a single colo.  I realize that in some cases
you have to have fail over between colos.  I am not aware of any turn key solution for things
like that, but generally what you want to do is to run two clusters, one in each colo, either
hot/hot or hot/warm, and I have seen both depending on how quickly you need to fail over.
 In hot/hot the input data is replicated to both clusters and the same software is run on
both.  In this case though you have to be fairly sure that your processing is deterministic,
or the results could be slightly different (i.e. No generating if random ids).  In hot/warm
the data is replicated from one colo to the other at defined checkpoints.  The data is only
processed on one of the grids, but if that colo goes down the other one can take up the processing
from where ever the last checkpoint was.

I hope that helps.

--Bobby

On 4/12/12 5:07 AM, "Manu S" <manupkd87@gmail.com> wrote:

Hi Abhishek,

1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc
* Recommendation: write to *two local directories on different
physical volumes*, and to an *NFS-mounted* directory
- Data will be preserved even in the event of a total failure of the
NameNode machines
* Recommendation: *soft-mount the NFS* directory
- If the NFS mount goes offline, this will not cause the NameNode
to fail

2. *Rack awareness*
https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf

On Thu, Apr 12, 2012 at 2:18 AM, Abhishek Pratap Singh
<manu.infy@gmail.com>wrote:

> Thanks Robert.
> Is there a best practice or design than can address the High Availability
> to certain extent?
>
> ~Abhishek
>
> On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans <evans@yahoo-inc.com>
> wrote:
>
> > No it does not. Sorry
> >
> >
> > On 4/11/12 1:44 PM, "Abhishek Pratap Singh" <manu.infy@gmail.com> wrote:
> >
> > Hi All,
> >
> > Just wanted if hadoop supports more than one data centre. This is
> basically
> > for DR purposes and High Availability where one centre goes down other
> can
> > bring up.
> >
> >
> > Regards,
> > Abhishek
> >
> >
>



--
Thanks & Regards
----
*Manu S*
SI Engineer - OpenSource & HPC
Wipro Infotech
Mob: +91 8861302855                Skype: manuspkd
www.opensourcetalk.co.in


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message