hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2064) Warm HA NameNode going Hot
Date Tue, 28 Jun 2011 09:04:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056402#comment-13056402
] 

Konstantin Shvachko commented on HDFS-2064:
-------------------------------------------

Rob, thanks for the design review. Some clarifications and comments.
Yes, I consider this design as a specification of Sanjay's and Suresh's design. In the sense
that it is a minimalistic (in terms of changes to the existing code) approach dedicated to
one direction - building HA based on StandbyNode.

> 1. K7.8
Accelerating block reports after failover is indeed an optimization. Good point, during normal
operations both BlockMaps should be in sync. And acceleration is targeted the case when SBN
misses a lot of block reports, which could be monitored on the SBN webUI or via metrics.

> 3. What is the scope of VIP solutions?
Talking to different people I came into conclusion that failover within one rack is sufficient.

- First of all, VIP is a good abstraction and if current implementation does not satisfy certain
needs, the networking industry will find a way to innovate. 
- Second, the rack that runs NN and SBN can be designed more reliably than regular (DataNode)
racks. With 2 TOR switches. With  bonded interfaces (forgive me if I get the terminology wrong)
inside the rack and outside for fault tolerance.
- Third, there are disasters that require a 9.0 magnitude earthquake followed by a tsunami
to happen. Should Hadoop be designed for that? Probably not. I just need to hit 99.94 availability
mark.

> 4. the stale deletion request problem
I hoped I covered it in 7.9. But I see now that this section needs more details and I missed
the third important case, when setReplication() is explicitly decreasing the replication.
I hope we can solve it by adding replica locations to logSetReplication(). I'll update this
section.

> 6. "leader election". Is the world really symmetric?
NN and SBN are asymmetric. And this simplifies things a lot: I am active NN if I have the
nn.vip. Some other node can think it is active, but since it doesn't have the vip, her ambitions
don't matter as nobody can to talk to her. Asymmetric design eliminates leader election and
client fencing. It's a good thing.

> 8. spooling edits on secondary storage.
By secondary storage you mean a filer or a Bookkeeper I think. Filer is an enterprise storage.
We are building a distributed storage system based on commodity components, and adding a dependency
on enterprise storage seams counterintuitive to me. Any shared storage solution will require
solving synchronization problems, see my note about addBlock() in 7.5. If blockReceived()
arrives to SBN before addBlock() this replica is lost for another hour. addBlock()  must be
synchronous in order to avoid such race condition.

> in a practical BN deployment, is there a remaining need for some shared storage?
I don't see any.

> Warm HA NameNode going Hot
> --------------------------
>
>                 Key: HDFS-2064
>                 URL: https://issues.apache.org/jira/browse/HDFS-2064
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: WarmHA-GoingHot.pdf
>
>
> This is the design for automatic hot HA for HDFS NameNode. It involves use of HA software
and LoadReplicator - external to Hadoop components, which substantially simplify the architecture
by separating HA- from Hadoop-specific problems. Without the external components it provides
warm standby with manual failover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message