hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12189) CallQueueManager may drop elements from the queue sometimes when calling swapQueue
Date Thu, 09 Jul 2015 19:29:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621118#comment-14621118

zhihai xu commented on HADOOP-12189:

[~arpitagarwal], thanks for the valuable information. It is good to know the clients can still
recover with timeout and retry even though the requests are dropped by server. In this case,
decreasing the chance of dropping queue elements without affecting the performance for normal
operations may be enough.
[~chrilisf], thanks for the review and suggestion. I uploaded a new patch HADOOP-12189.none_guarantee.001.patch,
which addressed your comments. I increase the number of checkpoint to 20 which gives us more
margin, 200(20*10)ms extra waiting time is much less than 1 second. I also fixed a typo in
the new patch. Please review it.

> CallQueueManager may drop elements from the queue sometimes when calling swapQueue
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-12189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12189
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc, test
>    Affects Versions: 2.7.1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch, HADOOP-12189.none_guarantee.000.patch,
> CallQueueManager may drop elements from the queue sometimes when calling {{swapQueue}}.

> The following test failure from TestCallQueueManager shown some elements in the queue
are dropped.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
> {code}
> java.lang.AssertionError: expected:<27241> but was:<27245>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
> {code}
> It looked like the elements in the queue are dropped due to {{CallQueueManager#swapQueue}}
> Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a possibility
that the elements in the queue are dropped. If the queue is full, the calling thread for {{CallQueueManager#put}}
is blocked for long time. It may put the element into the old queue after queue in {{takeRef}}
is changed by swapQueue, then this element in the old queue will be dropped.

This message was sent by Atlassian JIRA

View raw message