hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lohit Vijayarenu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2676) Maintaining cluster information across multiple job submissions
Date Fri, 22 Aug 2008 04:00:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624543#action_12624543

Lohit Vijayarenu commented on HADOOP-2676:

bq. +1 to Runping's comments. 
Should we also think about supporting this for DataNodes? We have been thinking about blacklisting
datanodes, faulty ones. Namenode could consider a blacklisted datanode equivalent to 'decommissioned
under progress' node. And also, un-blacklisting these nodes; does rebooting them makes them
clean and remove from blacklisted nodes? 

> Maintaining cluster information across multiple job submissions
> ---------------------------------------------------------------
>                 Key: HADOOP-2676
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2676
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.2
>            Reporter: Lohit Vijayarenu
> Could we have a way to maintain cluster state across multiple job submissions.
> Consider a scenario where we run multiple jobs in iteration on a cluster back to back.
The nature of the job is same, but input/output might differ. 
> Now, if a node is blacklisted in one iteration of job run, it would be useful to maintain
this information and blacklist this node for next iteration of job as well. 
> Another situation which we saw is, if there are failures less than mapred.map.max.attempts
in each iterations few nodes are never marked for blacklisting. But in we consider two or
three iterations, these nodes fail all jobs and should be taken out of cluster. This hampers
overall performance of the job.
> Could have have config variables something which matches a job type (provided by user)
and maintains the cluster status for that job type alone? 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message