hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Cluster hard drive ratios
Date Fri, 06 May 2011 12:33:14 GMT
On 05/05/11 19:14, Matthew Foley wrote:
> "a node (or rack) is going down, don't replicate" == DataNode Decommissioning.
> This feature is available.  The current usage is to add the hosts to be decommissioned
to the exclusion file named in dfs.hosts.exclude, then use DFSAdmin to invoke "-refreshNodes".
 (Search for "decommission" in DFSAdmin source code.)  NN will stop using these servers as
replication targets, and will re-replicate all their replicas to other hosts that are still
in service.  The count of nodes that are in the process of being decommissioned is reported
in the NN status web page.

I'm thinking more of "don't overreact to 50 machines going offline by 
rebalancing -all copies whose replication count has just dropped by 1, 
not until the rack has been offline for >30 minutes."

View raw message