hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Multiple data centre in Hadoop
Date Thu, 19 Apr 2012 21:31:00 GMT
I don't know of any open source solution in doing this... 
And yeah its something one can't talk about....  ;-)


On Apr 19, 2012, at 4:28 PM, Robert Evans wrote:

> Where I work  we have done some things like this, but none of them are open source, and
I have not really been directly involved with the details of it.  I can guess about what it
would take, but that is all it would be at this point.
> 
> --Bobby
> 
> 
> On 4/17/12 5:46 PM, "Abhishek Pratap Singh" <manu.infy@gmail.com> wrote:
> 
> Thanks bobby, I m looking for something like this..... Now the question is
> what is the best strategy to do Hot/Hot or Hot/Warm.
> I need to consider the CPU and Network bandwidth, also needs to decide from
> which layer this replication should start.
> 
> Regards,
> Abhishek
> 
> On Mon, Apr 16, 2012 at 7:08 AM, Robert Evans <evans@yahoo-inc.com> wrote:
> 
>> Hi Abhishek,
>> 
>> Manu is correct about High Availability within a single colo.  I realize
>> that in some cases you have to have fail over between colos.  I am not
>> aware of any turn key solution for things like that, but generally what you
>> want to do is to run two clusters, one in each colo, either hot/hot or
>> hot/warm, and I have seen both depending on how quickly you need to fail
>> over.  In hot/hot the input data is replicated to both clusters and the
>> same software is run on both.  In this case though you have to be fairly
>> sure that your processing is deterministic, or the results could be
>> slightly different (i.e. No generating if random ids).  In hot/warm the
>> data is replicated from one colo to the other at defined checkpoints.  The
>> data is only processed on one of the grids, but if that colo goes down the
>> other one can take up the processing from where ever the last checkpoint
>> was.
>> 
>> I hope that helps.
>> 
>> --Bobby
>> 
>> On 4/12/12 5:07 AM, "Manu S" <manupkd87@gmail.com> wrote:
>> 
>> Hi Abhishek,
>> 
>> 1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc
>> * Recommendation: write to *two local directories on different
>> physical volumes*, and to an *NFS-mounted* directory
>> - Data will be preserved even in the event of a total failure of the
>> NameNode machines
>> * Recommendation: *soft-mount the NFS* directory
>> - If the NFS mount goes offline, this will not cause the NameNode
>> to fail
>> 
>> 2. *Rack awareness*
>> 
>> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
>> 
>> On Thu, Apr 12, 2012 at 2:18 AM, Abhishek Pratap Singh
>> <manu.infy@gmail.com>wrote:
>> 
>>> Thanks Robert.
>>> Is there a best practice or design than can address the High Availability
>>> to certain extent?
>>> 
>>> ~Abhishek
>>> 
>>> On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans <evans@yahoo-inc.com>
>>> wrote:
>>> 
>>>> No it does not. Sorry
>>>> 
>>>> 
>>>> On 4/11/12 1:44 PM, "Abhishek Pratap Singh" <manu.infy@gmail.com>
>> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> Just wanted if hadoop supports more than one data centre. This is
>>> basically
>>>> for DR purposes and High Availability where one centre goes down other
>>> can
>>>> bring up.
>>>> 
>>>> 
>>>> Regards,
>>>> Abhishek
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Thanks & Regards
>> ----
>> *Manu S*
>> SI Engineer - OpenSource & HPC
>> Wipro Infotech
>> Mob: +91 8861302855                Skype: manuspkd
>> www.opensourcetalk.co.in
>> 
>> 
> 


Mime
View raw message