hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: roadmap: data integrity
Date Fri, 07 Aug 2009 16:03:59 GMT
Good to see there's direct edit replication support; that can make
things easier. 

I've seen people use DRDB or NFS to replicate edits currently. 

Namenode failover is a "solvable" issue with traditional HA: OS level
heartbeats, fencing, fail over -- e.g. HA infrastructure daemon starts
NN instance on node B if heartbeat from node A is lost and takes a
power control operation on A to make sure it is dead. On both nodes the
infastructure daemons trigger the OS watchdog if the NN process dies.
Combine this with automatic IP address reassignment. Then, page the
operators. Add another node C for additional redundancy, and make sure
all of the alternatives are on separate racks and power rails, and make
sure the L2 and L3 topology is also HA (e.g. bonded ethernet to
redundant switches at L2, mesh routing at L3, etc.) If the cluster is
not super huge it can all be spanned at L2 over redundant switches. L3
redundancy is tricker. A typical configuration could have a lot of OSPF
stub networks -- depends how L2 is partitoned -- which can make the
routing table difficult for operators to sort out.

I've seen this type of thing work for myself, ~15 seconds from 
(simulated) fault on NN node A to the new NN up and responding to DN
reconnections on node B, with 0.19. 

You can build in additional assurance of fast failover by building
redundant processes to run concurrently with a few datanodes which over
and over ping the NN via the namenode protocol and trigger fencing and
failover if it stops responding. 

One wrinkle is the new namenode starts up in safe mode. As long as
HBase can handle temporary periods where the cluster goes into
safemode after NN fail over, it can ride it out. 

This is ugly, but this is I believe an accepted and valid systems
engineering solution for the NN SPOF issue for the folks I mentioned
in my previous email, something they would be familiar with. Edit
replication support in HDFS 0.21 makes it a little less work to
achieve and maybe a little faster to execute, so that's an
improvement.

It may be overstating it a little bit to say that the NN SPOF is not a
concern for HBase, but, in my opinion, we need to address WAL and
(lack of FSCK) issues first before being concerned about it. HBase can
lose data all on its own. 

   - Andy





________________________________
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: hbase-dev@hadoop.apache.org
Sent: Friday, August 7, 2009 3:25:19 AM
Subject: Re: roadmap: data integrity

https://issues.apache.org/jira/browse/HADOOP-4539

This issue was closed long ago. But, Steve Loughran just said on tha
hadoop mailing list that the new NN has to come up with the same
IP/hostname as the failed one.

J-D

On Fri, Aug 7, 2009 at 2:37 AM, Ryan Rawson<ryanobjc@gmail.com> wrote:
> WAL is a major issue, but another one that is coming up fast is the
> SPOF that is the namenode.
>
> Right now, namenode aside, I can rolling restart my entire cluster,
> including rebooting the machines if I needed to. But not so with the
> namenode, because if it does AWOL, all sorts of bad can happen.
>
> I hope that HDFS 0.21 addresses both these issues.  Can we get
> positive confirmation that this is being worked on?
>
> -ryan
>
> On Thu, Aug 6, 2009 at 10:25 AM, Andrew Purtell<apurtell@apache.org> wrote:
>> I updated the roadmap up on the wiki:
>>
>>
>> * Data integrity
>>    * Insure that proper append() support in HDFS actually closes the
>>      WAL last block write hole
>>    * HBase-FSCK (HBASE-7) -- Suggest making this a blocker for 0.21
>>
>> I have had several recent conversations on my travels with people in
>> Fortune 100 companies (based on this list:
>> http://www.wageproject.org/content/fortune/index.php).
>>
>> You and I know we can set up well engineered HBase 0.20 clusters that
>> will be operationally solid for a wide range of use cases, but given
>> those aforementioned discussions there are certain sectors which would
>> say HBASE-7 is #1 before HBase is "bank ready". Not until we can say:
>>
>>  - Yes, when the client sees data has been committed, it actually has
>> been written and replicated on spinning or solid state media in all
>> cases.
>>
>>  - Yes, we go to great lengths to recover data if ${deity} forbid you
>> crush some underprovisioned cluster with load or some bizarre bug or
>> system fault happens.
>>
>> HBASE-1295 is also required for business continuity reasons, but this
>> is already a priority item for some HBase committers.
>>
>> The question is I think does the above align with project goals.
>> Making HBase-FSCK a blocker will probably knock something someone
>> wants for the 0.21 timeframe off the list.
>>
>>   - Andy
>>
>>
>>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message