hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: HDFS and long-running processes
Date Tue, 21 Jul 2009 10:26:51 GMT
Todd Lipcon wrote:

> On Sat, Jul 4, 2009 at 9:08 AM, David B. Ritch <david.ritch@gmail.com>wrote:
> 
>> Thanks, Todd.  Perhaps I was misinformed, or misunderstood.  I'll make
>> sure I close files occasionally, but it's good to know that the only
>> real issue is with data recovery after losing a node.
>>
> 
> Just to be clear, there aren't issues with data recovery of already-written
> files. The issue is that, when you open a new file to write it, Hadoop sets
> up a pipeline that looks something like:
> 
> Writer -> DN A -> DN B -> DN C
> 
> Where each of DN [ABC] are datanodes in your HDFS cluster. If Writer is also
> a node in your HDFS cluster it will attempt to make DN A be the same machine
> as Writer.
> 
> If DN B fails, the write pipeline will reorganize itself to:
> 
> Writer -> DN A -> DN C
> 
> In theory I *believe* it's supposed to pick up a new datanode at this point
> and tack it onto the end, but I'm not certain this is implemented quite yet.
> Maybe Dhruba or someone else with more knowledge here can chime in.

Sounds like a good opportunity for a fun little test -start the write on 
a 4DN (local) cluster, kill the DN in use, check that all is well

Mime
View raw message