hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4065) TestDFSShell.testGet sporadically fails attempting to corrupt block files due to race condition
Date Wed, 17 Oct 2012 21:08:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478378#comment-13478378

Chris Nauroth commented on HDFS-4065:

Hello, Daryn.

My measurements show that stopping and starting the mini-cluster like this takes 1-2.5 seconds.
 Before I coded this, I had looked for a lighter weight method: a sync or a flush like you
suggested.  Unfortunately, the current data node code provides no such method.  DataNode.shutdown
is the only method that synchronously waits for all DataXceiver threads to finish.  I figured
it wasn't worthwhile to refactor DataNode.shutdown (and possibly weaken encapsulation) just
to support one test case that needs to do some highly unusual operations on the raw block

If speeding up this whole test suite (which currently runs in ~50 seconds) is a goal, then
I'd actually suggest refactoring so that all 14 tests in the suite reuse the same MiniDFSCluster
instead of starting and stopping 14 times.  That would increase speed, but at the cost of
reduced test isolation.  (This part would be tracked as a separate Jira.)


> TestDFSShell.testGet sporadically fails attempting to corrupt block files due to race
> -----------------------------------------------------------------------------------------------
>                 Key: HDFS-4065
>                 URL: https://issues.apache.org/jira/browse/HDFS-4065
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1-win
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HDFS-4065-branch-1-win.patch
> TestDFSShell.testGet attempts to simulate corruption of block files in order to test
hadoop fs -get with the -ignoreCrc option.  It is possible that the data node's DataXceiver
thread has not yet closed the block file.  This causes a locking violation on Windows, so
the test fails.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message