hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Does anybody have tried to setup a cluster with multiple namenodes?
Date Tue, 21 Oct 2008 10:39:42 GMT
Chris Douglas wrote:
> The secondary namenode is neither a backup service for the HDFS 
> namespace nor a failover for requests:
> http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode 
> The secondary namenode periodically merges an image (FSImage) of the 
> namesystem with recent changes (FSEdits), almost always on another 
> machine. The memory requirements for the NameNode are formidable, and 
> merging these on the same machine that's servicing requests would make 
> them infeasible for large clusters. -C

Currently that namenode is a SPOF; puts a limit on both the scale and 
availability of a large cluster. The ultimate way to fix both problems 
would be to (somehow) share the work out among multiple namenodes, 
though I hesitate to come up with an architecture for doing that 
reliably, let alone a commitment to do any of the work. Some possible 
ways to do this (none of which are in the codebase today)

-give different namenodes ownership of bits of the filesystem; they'd 
need to coordinate allocation of blocks across machines though.

-have peer namenodes sharing state using some kind of tuple-space or 
similar distributed datastructure. That's easier said than done though; 
a busy cluster will have a high change-rate on the t-space facts and you 
need to choreograph the directory operations quite carefully.

-Have the namenode storing state into a local database which is 
clustered in some failover design (shared disk array, or synchronization 
of changes over ethernet)

This would be a very interesting project for someone out there to take 
up. One of the fun problems is testing failover works in all situations, 
especially handing not the outage of one of the machines, but the 
partitioning of the network so the two namenodes can't see each other.


View raw message