hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: data loss using hadoop
Date Tue, 21 Mar 2006 05:44:10 GMT

One discussion point...

In previous systems we've built, we've tended to have a master node  
configured with a list of all of its slaves.  Dead or inactive slaves  
are then listed as such in the config as well.  This let's the master  
know what nodes to expect.  It seems to me that since the HDFS relies  
on a stable (roughly) set of nodes to provide persistent storage,  
moving a node list to the master will allow simpler startup code and  
general error checking, without loosing any real freedom.

I'd not suggest this for the task tracker at this point, since it can  
actually use ad hoc nodes, but HDFS gains nothing but instability  
from ah hoc nodes showing up and dropping out.  Much better to have a  
configured list of servers IMO.

Thoughts?  Should we file a bug to make this so?

On Mar 20, 2006, at 1:33 PM, Yoram Arnon wrote:

> While playing around with a hadoop dfs cluster, we've observed data  
> loss.
> This may be related to our having stopped and restarted the DFS a  
> couple of
> times, possibly with nodes not all going online and offline at just  
> the
> right timing, but many of our files, ranging in size from less than  
> 1GB to
> multi GB each, have at least one block missing. Blocks are missing  
> from
> relatively new files, generated within the last two weeks, the file  
> system
> was never more than 25% full, and there's no outstanding reason why  
> this
> loss should have happened.
> In the upcoming days/weeks we'll be looking into the reasons, and  
> for ways
> of making the DFS more robust against this kind of loss.
> Yoram

View raw message