hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
Date Fri, 11 Sep 2015 07:13:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740321#comment-14740321
] 

Zhe Zhang commented on HDFS-8704:
---------------------------------

Thanks for updating the patch Bo. My main concern is still the nested {{run()}} structure
in {{StripedDataStreamer}}. 
{code}
   @Override
+  public void run() {
+
+    while (!toTerminate && !streamerClosed &&
+        dfsClient.clientRunning && !errorState.hasError()) {
+      super.run();
{code}

[~walter.k.su] is exploring the idea of a group streamer in HDFS-9040, and [~jingzhao] is
trying to move {{locateFollowBlock}} to DFSOutputStream level. If either of the two directions
works, the role of a streamer will be limited to transferring a single internal block, which
will solve this problem. So I suggest we keep this JIRA open and waiit for a conclusion on
these 2 efforts. 

> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, HDFS-8704-HDFS-7285-003.patch,
HDFS-8704-HDFS-7285-004.patch, HDFS-8704-HDFS-7285-005.patch, HDFS-8704-HDFS-7285-006.patch,
HDFS-8704-HDFS-7285-007.patch, HDFS-8704-HDFS-7285-008.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is corrupt, client
succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}}
only tests files smaller than a block group, this jira will add more test situations.
> A streamer may encounter some bad datanodes when writing blocks allocated to it. When
it fails to connect datanode or send a packet, the streamer needs to prepare for the next
block. First it removes the packets of current  block from its data queue. If the first packet
of next block has already been in the data queue, the streamer will reset its state and start
to wait for the next block allocated for it; otherwise it will just wait for the first packet
of next block. The streamer will check periodically if it is asked to terminate during its
waiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message