hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HDFS and long-running processes
Date Tue, 21 Jul 2009 18:25:25 GMT
On Tue, Jul 21, 2009 at 3:26 AM, Steve Loughran <stevel@apache.org> wrote:

> Todd Lipcon wrote:
>
>  On Sat, Jul 4, 2009 at 9:08 AM, David B. Ritch <david.ritch@gmail.com
>> >wrote:
>>
>>  Thanks, Todd.  Perhaps I was misinformed, or misunderstood.  I'll make
>>> sure I close files occasionally, but it's good to know that the only
>>> real issue is with data recovery after losing a node.
>>>
>>>
>> Just to be clear, there aren't issues with data recovery of
>> already-written
>> files. The issue is that, when you open a new file to write it, Hadoop
>> sets
>> up a pipeline that looks something like:
>>
>> Writer -> DN A -> DN B -> DN C
>>
>> Where each of DN [ABC] are datanodes in your HDFS cluster. If Writer is
>> also
>> a node in your HDFS cluster it will attempt to make DN A be the same
>> machine
>> as Writer.
>>
>> If DN B fails, the write pipeline will reorganize itself to:
>>
>> Writer -> DN A -> DN C
>>
>> In theory I *believe* it's supposed to pick up a new datanode at this
>> point
>> and tack it onto the end, but I'm not certain this is implemented quite
>> yet.
>> Maybe Dhruba or someone else with more knowledge here can chime in.
>>
>
> Sounds like a good opportunity for a fun little test -start the write on a
> 4DN (local) cluster, kill the DN in use, check that all is well
>

I have an internal ticket to write just such a test, just haven't had time
to finish it yet ;-) Volunteers welcome!

-Todd

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message