hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
Date Tue, 25 Aug 2015 21:27:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712004#comment-14712004
] 

Zhe Zhang commented on HDFS-8704:
---------------------------------

Thanks for the work Bo. Please find my comments below:
# The JIRA description / summary only describes the symptom. Could you briefly describe the
solution? 
# IIUC you are getting rid of the {{setFailed}} logic and instead rewriting the main logic
of {{StripedDataStreamer}}. {{setFailed}} is quite fundamental in the current fault tolerance
logic, it is actually used in the new code from HDFS-8202. [~walter.k.su] Could you comment
on whether you are (or plan to) rely on it in HDFS-8383?
# {{run}} is a loop and calling it within the while loop of another {{run}} doesn't look right.
The added code in {{StripedDataStreamer#run}} is actually a little hard to follow. Could you
provide a design either in the JIRA summary or as a comment? What are "trivial packets"?
{code}
+    while (!toTerminate && !streamerClosed &&
+        dfsClient.clientRunning && !errorState.hasError()) {
+      super.run();
{code}
# The patch needs a rebase.

Smaller issues:
# For the below change, did you see negative {{numBytes}} in tests? That would be surprising.
{code}
+      //the streamer may fail to send packets
+      if (numBytes < 0) {
+        numBytes = s0.bytesSent;
+      }
       for (int i = 1; i < numDataBlocks; i++) {
         final StripedDataStreamer si = getStripedDataStreamer(i);
         final ExtendedBlock bi = si.getBlock();
         if (bi != null && bi.getGenerationStamp() > block.getGenerationStamp())
{
           block.setGenerationStamp(bi.getGenerationStamp());
         }
-        numBytes += atBlockGroupBoundary? bi.getNumBytes(): si.getBytesCurBlock();
+        long streamerBytes = atBlockGroupBoundary ? bi.getNumBytes() : si.getBytesCurBlock();
+        if (streamerBytes < 0) {
+          streamerBytes = si.bytesSent;;
+        }
+        numBytes += streamerBytes;
       }
{code}

> Erasure Coding: client fails to write large file when one datanode fails
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8704
>                 URL: https://issues.apache.org/jira/browse/HDFS-8704
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, HDFS-8704-HDFS-7285-003.patch,
HDFS-8704-HDFS-7285-004.patch, HDFS-8704-HDFS-7285-005.patch
>
>
> I test current code on a 5-node cluster using RS(3,2).  When a datanode is corrupt, client
succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}}
only tests files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message