hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Bo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
Date Thu, 30 Jul 2015 06:14:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647239#comment-14647239
] 

Li Bo commented on HDFS-8704:
-----------------------------

I have just update a second patch of this problem. The changes in this patch include:
1.	{{DFSStripedOutputStream}} and the failed status of {{StripedDataStreamer}}:  it’s not
right to take different actions according to the current status of a streamer. When a streamer
has failed, while the packet to be queued belongs to the next block(the streamer can successfully
write to that block because the new datanode may be well), in this condition the packet should
be handled as usual. When {{DFSStripedOutputStream}} finds a streamer is working well, it
queue the packet to the streamer, but the streamer may fail before sending the packet. So
I remove the logic of checking and setting the failed status of a streamer in {{DFSStripedOutputStream}}.
 When a streamer fails, itself knows how to handle the failure.
2.	Extend the functionality of {{StripedDataStreamer}} : if error occurs, {{ StripedDataStreamer
}} will first handle remaining trivial packets of current block, and then restart to waiting
for a new block to be allocated to it. 
3.	Add a test to {{TestDFSStripedOutputStreamWithFailure}} which tests writing a file with
two block groups.  

The unit test occasionally fails because only 8 block locations are given by namenode for
the second block group. HDFS-8839 has been created to track this problem.


> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is corrupt, client
succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}}
only tests files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message