hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon" <t...@apache.org>
Subject Re: Review Request: HDFS-2301 Start/stop appropriate namenode internal services during transition to active and standby
Date Thu, 06 Oct 2011 22:55:43 GMT


> On 2011-10-03 18:58:13, Todd Lipcon wrote:
> > just a few nits, mostly looks good. A few questions I have that aren't directly
related to this patch:
> > - is SafeMode now a replicated thing, or does each NN separately enter safemode?
I think the latter, right?
> > - when transitioning between states, what happens if the "enterState" fails for
the new state? The state variable will then indicate it's in that state, when in fact it's
in no state at all. How do we recover from that? We need some kind of rollback? (eg if you're
in standby and try to transition to active, but find that you can't take a lock in ZK)
> 
> Suresh Srinivas wrote:
>     > is SafeMode now a replicated thing, or does each NN separately enter safemode?
I think the latter, right?
>     Safemode is the state of namespace(FSNamesystem), unlike active, standby which are
the states of the namenode. Each NN separately enters safemode.
>     
>     > when transitioning between states, what happens if the "enterState" fails for
the new state? The state variable will then indicate it's in that state, when in fact it's
in no state at all. How do we recover from that? We need some kind of rollback? (eg if you're
in standby and try to transition to active, but find that you can't take a lock in ZK)
>     This is tricky. Say enterState fails to start services because of some namenode process
related issues. Then most likely rolling back to previous state, and starting services relevant
to previous states will also fail. The particular example you are bringing up related to ZK,
I think failover controller is the one that deals with ZK and not namenode.
>     
>     I can think of two solutions: namenode shutsdown when this happens (as done during
startup) or move to a failed state.

Let's just add a TODO for now that we need to consider these situations in a test plan. I
imagine the most likely real scenario is that you try to do a failover, but for some reason
the standby has an IO problem trying to read the latest logs from the primary (eg maybe the
primary barfed some bad data into the edit logs as it crashed, or maybe the primary crashed
because the shared storage caught on fire).


> On 2011-10-03 18:58:13, Todd Lipcon wrote:
> > branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java,
line 464
> > <https://reviews.apache.org/r/2150/diff/1/?file=47529#file47529line464>
> >
> >     any reason that you switched the order of startHttpServer to the end of this
function? I don't think it's a big deal, but there's some possibility the service plugins
may want to do something with the http server, which wouldn't be started yet.
> 
> Suresh Srinivas wrote:
>     No particular reason. Not sure who uses ServicePlugins. But the description says
it is RPC related. But will move it back up.

Hue currently uses service plugins to expose a Thrift interface. But with Sanjay's recent
work on protocol adapters, this may be largely unnecessary in the future. Nonetheless, we
should leave it around :)


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2150/#review2277
-----------------------------------------------------------


On 2011-10-03 18:36:41, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2150/
> -----------------------------------------------------------
> 
> (Updated 2011-10-03 18:36:41)
> 
> 
> Review request for hadoop-hdfs and Todd Lipcon.
> 
> 
> Summary
> -------
> 
> Uploading Suresh's patch to reviewboard (https://issues.apache.org/jira/secure/attachment/12496953/HDFS-2301.txt
from 29/Sep/11 00:56)
> 
> 
> This addresses bug HDFS-2301.
>     https://issues.apache.org/jira/browse/HDFS-2301
> 
> 
> Diffs
> -----
> 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java
1177130 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
1177130 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
1177130 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ActiveState.java
1177128 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/HAContext.java
PRE-CREATION 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/HAState.java
1177128 
>   branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java
1177128 
> 
> Diff: https://reviews.apache.org/r/2150/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Todd
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message