hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Clampffer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10441) libhdfs++: HA namenode support
Date Fri, 01 Jul 2016 21:39:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359685#comment-15359685

James Clampffer commented on HDFS-10441:

Thanks for the very detailed feedback Bob.  You caught a lot of issues that slipped my mind
+ a few I didn't really think about.

As far as bigger issues go (other points are also important to me, just want to give comments
on these before the next patch):
bq. Do we pass information in the data pointer for the NN failover event? If so, document
it in events.h
I think I accidentally left this out.  Needs to be fixed, or I'll take it out of the header
and add another jira for that.

bq. rpc_connection.cc: HandleRpcResponse should push req back to the head of the queue; alternately,
don't dequeue it if we got a standby exception.
Need to take a better look at how to best do this.  Will fix.

bq. If HandleRpcResponse gets a kStandbyException, will CommsError be called twice (once in
HandleRpcResponse and again in OnRecvComplete)?
Yes, and I haven't figured out why yet, I should be able to track it down quickly now that
I'm more familiar with the RPC retry code.  It doesn't seem to break things but it sure makes
reading the logs more confusing than they need to be.

bq. rpc_engine.cc: let's use both namenodes if servers.size() >= 2 rather than just bailing
I'm torn on this one.  As far as I know you'd need to do a bit of hacking to get more than
two nodes working in HA.  It seems like it's best to cover the common case here.  How about
allowing more than 2 nodes but issuing warnings that say that your cluster probably isn't
going to act as you'd expect based on the config file and that we discarded the extra nodes?

bq. rpc_engine.h: IsCurrentActive/IsCurrentStandby are dangerous as designed: they're asking
for race conditions as we acquire the lock, check, release the lock, then take action. Just
before we take action, someone else could change the value
You're totally right here.  I didn't want to implement a mechanism to allow other things to
lock the state of that structure for an arbitrary amount of time (seemed like it would be
asking for a deadlock).  I think the best fix here is to make them private to the NANamenodeTracker.

bq. rpc_engine.cc: Remove RpcEngine::Start instead of deprecating it
Yep, I left that in to see if someone could think of a use for a hook that got called once
the IoService was known to be running.  If you can't think of any I'll be happy to get rid
of some dead code.

bq. Don't forget to file bugs to handle more than two namenodes.
Sure.  I was thinking of holding off until that became widely supported but it won't hurt
to plan ahead.

As a general warning, I don't plan to have much internet access for the next week.  Once I
do I'll address the rest of the comments and get a new patch up that includes the rest of
your feedback.

> libhdfs++: HA namenode support
> ------------------------------
>                 Key: HDFS-10441
>                 URL: https://issues.apache.org/jira/browse/HDFS-10441
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-10441.HDFS-8707.000.patch, HDFS-10441.HDFS-8707.002.patch,
HDFS-10441.HDFS-8707.003.patch, HDFS-10441.HDFS-8707.004.patch, HDFS-10441.HDFS-8707.005.patch,
HDFS-10441.HDFS-8707.006.patch, HDFS-10441.HDFS-8707.007.patch, HDFS-10441.HDFS-8707.008.patch,
HDFS-10441.HDFS-8707.009.patch, HDFS-8707.HDFS-10441.001.patch
> If a cluster is HA enabled then do proper failover.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message