Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 28 Mar 2016 01:57:25 +0000 (UTC)
From: "Lin Yiqun (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12952675.1458701384000.61407.1459130245546@Atlassian.JIRA>
In-Reply-To: <JIRA.12952675.1458701384000@Atlassian.JIRA>
References: <JIRA.12952675.1458701384000@Atlassian.JIRA>
 <JIRA.12952675.1458701384049@arcas>
Subject: [jira] [Commented] (HDFS-10197) TestFsDatasetCache failing
 intermittently due to timeout
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213716#comment-15213716 ] 

Lin Yiqun commented on HDFS-10197:
----------------------------------

The minidfs cluster and only one datanode was reused in {{TestFsDatasetCache}}, I suspect that will lead to cache block slowly. So could we do a test isolation of minidfs cluster for some timeout test and also set the {{dfs.datanode.fsdatasetcache.max.threads.per.volume}} bigger, what do you think of this idea, [~andrew.wang].

> TestFsDatasetCache failing intermittently due to timeout
> --------------------------------------------------------
>
>                 Key: HDFS-10197
>                 URL: https://issues.apache.org/jira/browse/HDFS-10197
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: HDFS-10197.001.patch, HDFS-10197.002.patch
>
>
> In {{TestFsDatasetCache}}, the unit tests failed sometimes. I collected some failed reason in recent jenkins reports. They are all timeout errors.
> {code}
> Tests in error: 
>   TestFsDatasetCache.testFilesExceedMaxLockedMemory:378 ? Timeout Timed out wait...
>   TestFsDatasetCache.tearDown:149 ? Timeout Timed out waiting for condition. Thr...
> {code}
> {code}
> Tests in error: 
>   TestFsDatasetCache.testPageRounder:474 ?  test timed out after 60000 milliseco...
>   TestBalancer.testUnknownDatanodeSimple:1040->testUnknownDatanode:1098 ?  test ...
> {code}
> But there was a little different between these failure.
> * The first because the total block time was exceed the {{waitTimeMillis}}(here is 60s)  then throw the timeout exception and print thread diagnostic string in method {{DFSTestUtil#verifyExpectedCacheUsage}}.
> {code}
>     long st = Time.now();
>     do {
>       boolean result = check.get();
>       if (result) {
>         return;
>       }
>       
>       Thread.sleep(checkEveryMillis);
>     } while (Time.now() - st < waitForMillis);
>     
>     throw new TimeoutException("Timed out waiting for condition. " +
>         "Thread diagnostics:\n" +
>         TimedOutTestsListener.buildThreadDiagnosticString());
> {code}
> * The second is due to test elapsed time more than timeout time setting. Like in {{TestFsDatasetCache#testPageRounder}}.
> We should adjust timeout time for these unit test which would failed sometimes due to timeout.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)