hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: minor change in dataNode handling of multiple directories.
Date Thu, 30 Nov 2006 08:44:48 GMT
We want to be able to support fail in place too.  IE a machine should  
be able to be used with one dead drive.  It sounds like this is a  
step in the wrong direction.

Perhaps we should just allow a node to upgrade new directories that  
appear later?  Need to be sure snapshotting works as expected in this  
case too...

I think it is worth solving this more complicated problem.

Upgrades should not be possible unless enough of the FS is reachable  
to leave safemode IMO.  This means we'll need to be able to test for  
this before we upgrade.  Fun!

On Nov 29, 2006, at 6:03 PM, Bryan A. P. Pendleton wrote:

> I would prefer this proposal not be implements. The current way  
> things work
> makes it possible to configure, centrally, a list of all  
> directories that
> _could_ be used for storage. Since there's no easy way to do per-node
> configurations (nor would it be desirable, IMO, in this case), the
> directories config ends up being the list of all possibly usable
> directories. Many of my cluster nodes are configured using  
> "rocksclusters":
> they will have a uniform set of mounts created, one for each  
> physical drive,
> at boot/re-install. If I specify in my config the list of all  
> directories up
> to the most number of drives a machine will ever have, then I get easy
> drop-in use, regardless of variations in nodes in the cluster. I  
> have been
> relying in the current behavior to keep me sane.
> OTOH, I wouldn't oppose making this the default behavior, with a
> configuration param that would set things back to the old behavior.
> On 11/29/06, Raghu Angadi <rangadi@yahoo-inc.com> wrote:
>> As part of the "Version upgrade" related changes, thinking of  
>> strictly
>> requiring that datanode be able to lock _all_ the configured  
>> directories
>> instead of any one of them.
>> Currently if multiple data directories are specified for a  
>> datanode, it
>> tries to lock a file is in each of the directories. If it fails to  
>> lock
>> some of the directories, it will use the directories that it could.
>> Looks like this flexibility was included mainly for convenience in
>> config file.
>> This might not affect anyone, let us know of your opinions.
>> Note that all directories have the same storage id. So each  
>> individual
>> directory is not complete by itself but a part of one storage.
>> Raghu.
> -- 
> Bryan A. P. Pendleton
> Ph: (877) geek-1-bp

View raw message