hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10197) TestFsDatasetCache failing intermittently due to timeout
Date Thu, 24 Mar 2016 03:27:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209669#comment-15209669
] 

Lin Yiqun commented on HDFS-10197:
----------------------------------

Thanks [~andrew.wang] for comments.
{quote}
However, do you have any ideas about why the tests take so long to run? A better solution
is to optimize the tests to run faster.
{quote}
I analysed some of this, the three places are all due to timeout of waitting for cache blocks
and its cache used. It indicated that sometimes caching blocks is slow. One comment from me:

* Now the config {{dfs.datanode.fsdatasetcache.max.threads.per.volume}} is default set as
4. when more than 4 blocks are caching at the same time in unit tests(Like in {{testPageRounder}},
numblocks is 5), it seems some will be waitting. We can adjust this value to a bigger value.

Update the latest patch for addressing comments, pending jenkins.

> TestFsDatasetCache failing intermittently due to timeout
> --------------------------------------------------------
>
>                 Key: HDFS-10197
>                 URL: https://issues.apache.org/jira/browse/HDFS-10197
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: HDFS-10197.001.patch
>
>
> In {{TestFsDatasetCache}}, the unit tests failed sometimes. I collected some failed reason
in recent jenkins reports. They are all timeout errors.
> {code}
> Tests in error: 
>   TestFsDatasetCache.testFilesExceedMaxLockedMemory:378 ? Timeout Timed out wait...
>   TestFsDatasetCache.tearDown:149 ? Timeout Timed out waiting for condition. Thr...
> {code}
> {code}
> Tests in error: 
>   TestFsDatasetCache.testPageRounder:474 ?  test timed out after 60000 milliseco...
>   TestBalancer.testUnknownDatanodeSimple:1040->testUnknownDatanode:1098 ?  test ...
> {code}
> But there was a little different between these failure.
> * The first because the total block time was exceed the {{waitTimeMillis}}(here is 60s)
 then throw the timeout exception and print thread diagnostic string in method {{DFSTestUtil#verifyExpectedCacheUsage}}.
> {code}
>     long st = Time.now();
>     do {
>       boolean result = check.get();
>       if (result) {
>         return;
>       }
>       
>       Thread.sleep(checkEveryMillis);
>     } while (Time.now() - st < waitForMillis);
>     
>     throw new TimeoutException("Timed out waiting for condition. " +
>         "Thread diagnostics:\n" +
>         TimedOutTestsListener.buildThreadDiagnosticString());
> {code}
> * The second is due to test elapsed time more than timeout time setting. Like in {{TestFsDatasetCache#testPageRounder}}.
> We should adjust timeout time for these unit test which would failed sometimes due to
timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message