hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly
Date Mon, 11 Apr 2016 02:17:25 GMT
Lin Yiqun created HDFS-10275:

             Summary: TestDataNodeMetrics failing intermittently due to TotalWriteTime counted
                 Key: HDFS-10275
                 URL: https://issues.apache.org/jira/browse/HDFS-10275
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: test
            Reporter: Lin Yiqun
            Assignee: Lin Yiqun

The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info show these:
Results :

Failed tests: 
expected:<false> but was:<true>

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for ...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked into the code
and found the real reason is that the metric of {{TotalWriteTime}} frequently count 0 in each
iteration of creating file. And the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason whic cause {{TotalWriteTime}}
metric count always be 0 is that we using the {{SimulatedFSDataset}} for spending time test.
In {{SimulatedFSDataset}}, it will use the inner class's method {{SimulatedOutputStream#write}}
to count the write time and the method of this class just updates the {{length}} and throws
its data away.
    public void write(byte[] b,
              int off,
              int len) throws IOException  {
      length += len;
So the writing operation hardly not costs any time. So we should use a real way to create
file instead of simulated way. I have tested in my local that the test is passed just one
time when I delete the simulated way, while the test retries many times to count write time
in old way.

This message was sent by Atlassian JIRA

View raw message