hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2681) Add ZK client for leader election
Date Mon, 16 Jan 2012 20:41:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187168#comment-13187168
] 

Todd Lipcon commented on HDFS-2681:
-----------------------------------

bq. So if your TCP disconnect timeouts are not set insanely high (> session timeout) then
enterSafeMode will be called before session timeout expires and someone else becomes a master.

This still isn't "safe". For example, imagine the NN goes into a multi-minute GC pause just
before writing an edit to its edit log. Since the GC pause is longer than the session timeout,
some other NN will take over. Without active fencing, when the first NN wakes up, it will
make that mutation to the edit log before it finds out about the ZK timeout.

It sounds contrived but we've had many instances of data loss bugs in HBase due to scenarios
like this in the past. Multi-minute GC pauses are rare but do happen.

bq. It public because its a well defined property of the class.
But it implies that external consumers of this class may want to directly manipulate the znode
-- which is exposing an implementation detail unnecessarily.

bq. Is the ALLCAPS on static strings a convention? You mean the member name should be all
caps or the value?

Yes, it's a convention that constants should have all-caps names. See the Sun java coding
conventions, which we more-or-less follow: http://www.oracle.com/technetwork/java/codeconventions-135099.html#367

bq. So I need to have mock initialized before constructing the tester object. So I made mock
a static member. But then java complained that inner classes cannot have static members.
I'm not quite following - you already initialize the non-static {{mockZk}} in {{TestActiveStandbyElector.init()}}?.
Then if it's a non-static inner class, it can simply refer to the already-initialized member
of its outer class.

bq. Could you please point me to some place which explains what to log at different log levels?
I don't think we have any formal guidelines here.. the basic assumptions I make are:
- ERROR: unrecoverable errors (eg some block is apparently lost, or a failover failed, etc)
- WARN: recoverable errors (eg failures that will be retried, blocks that have become under-replicated
but can be repaired, etc)
- INFO: normal operations proceeding as expected, but interesting enough that operators will
want to see it.
- DEBUG: information that will be useful to developers debugging unit tests or running small
test clusters (unit tests generally enable these, but users generally don't). Also handy when
you have a reproducible bug on the client - you can ask the user to enable DEBUG and re-run,
for example.
- TRACE: super-detailed trace information that will only be enabled in rare circumstances.
We don't use this much.


                
> Add ZK client for leader election
> ---------------------------------
>
>                 Key: HDFS-2681
>                 URL: https://issues.apache.org/jira/browse/HDFS-2681
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Suresh Srinivas
>            Assignee: Bikas Saha
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch,
HDFS-2681.HDFS-1623.patch, Zookeeper based Leader Election and Monitoring Library.pdf
>
>
> ZKClient needs to support the following capabilities:
> # Ability to create a znode for co-ordinating leader election.
> # Ability to monitor and receive call backs when active znode status changes.
> # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message