hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3628) Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
Date Thu, 22 Jan 2009 17:05:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666211#action_12666211
] 

Steve Loughran commented on HADOOP-3628:
----------------------------------------

Suresh, I'm finally sitting down to look at your comments; its going to take me a while to
work through them. Thank you (and tom!) for their comments. 

Some quick answers. 

"Should we have a state that captures out of service for maintenance?"

It's kind of tricky to have that on something that is network visible, since a lot of maintenance
makes the program unreachable. At the same time, I could imagine a cluster being offline to
new
job submissions, or a filesystem in read-only mode. 

On a related note, I've long wondered if we should have an XHML format for web sites to say
that they are offline for maintenance in some way that was machine readable; something you
could aggregate and which would include forward maintenance notices. Something would hit the
status pages of the various machines in your infrastructure and build up a calendar of planned
outages, work out which were the SPOF and aggregate them differently from the redundant bits.
I dont think this is the right time for this, but it's still an idea I'm fond of. 


"-I am not clear on how Failed state transitions to Terminated. If failed state transitions
to terminated, the fact that the service failed will no longer available?"

Good question. When terminated, a service should shut down its thread, do any cleanup. But
any underlying exception that triggered a failure should still be in the failureCause field.
So provided that a throwable was passed in to enterFailedState(), the service remembers what
happened.
What I'm not doing (currently) is retaining the entire history. We could do that if you felt
it was useful; build up a list of state transitions and timestamps, the history. 


I'm going to look more at your statemachine proposal. One thing that worries me about being
able to add new states is how well do they aggregate? 

> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-3628
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3628
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs, mapred
>    Affects Versions: 0.20.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: AbstractHadoopComponent.java, hadoop-3628.patch, hadoop-3628.patch,
hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch,
hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch,
hadoop-3628.patch, hadoop-3628.patch, hadoop-lifecycle.pdf, hadoop-lifecycle.sxw
>
>
> I'd like to propose we have a standard interface for hadoop components, the things that
get started or stopped when you bring up a namenode. currently, some of these classes have
a stop() or shutdown() method, with no standard name/interface, but no way of seeing if they
are live, checking their health of shutting them down reliably. Indeed, there is a tendency
for the spawned threads to not want to die; to require the entire process to be killed to
stop the workers. 
> Having a standard interface would make it easier for 
>  * management tools to manage the different things
>  * monitoring the state of things
>  * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up threads in
their constructor; that's very dangerous as subclasses may have their methods called before
they are full initialised. Adding this interface would be the right time to clean up the startup
process so that subclassing is less risky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message