hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8602) Erasure Coding: Client can't read(decode) the EC files which have corrupt blocks.
Date Wed, 17 Jun 2015 21:13:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-8602:
----------------------------
    Attachment: HDFS-8602.000.patch

Thanks very much for reporting the issue and working on this, [~kaisasak]!

I also did some debugging on the issue. Looks like the cause is a deadlock: after hitting
the exception while reading the corrupted block, {{readToBuffer}} tries to print out some
warning msg during which {{getCurrentBlock}} is called. {{getCurrentBlock}} needs to acquire
the inputstream's lock, which is currently held by the main thread, and the main thread is
waiting for the response from the reading threads.

The patch includes a simple fix and also a unit test that can reproduce the issue ({{testReadCorruptedData2}}).

> Erasure Coding: Client can't read(decode) the EC files which have corrupt blocks.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-8602
>                 URL: https://issues.apache.org/jira/browse/HDFS-8602
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Takanobu Asanuma
>            Assignee: Kai Sasaki
>             Fix For: HDFS-7285
>
>         Attachments: HDFS-8602.000.patch
>
>
> Before the DataNode(s) reporting bad block(s), when Client reads the EC file which has
bad blocks, Client gets hung up. And there are no error messages.
> (When Client reads the replicated file which has bad blocks, the bad blocks are reconstructed
at the same time, and Client can reads it.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message