hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
Date Tue, 04 Sep 2012 22:17:08 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Devaraj Das updated HBASE-6649:

    Attachment: 6649-1.patch

After spending some time on debugging what was going on (where I took the failure as in http://bit.ly/RDdmPg
as the test failure to debug), seems to me that the problem is due to the way exceptions are
handled in ReplicationSource.java. Basically, the replication would fail with exceptions for
all entries involved in a particular call to ReplicationSource.readAllEntriesToReplicateOrNextFile,
even if the exception were thrown for the tailing entry(s). This is because of multiple calls
to reader.next within readAllEntriesToReplicateOrNextFile. If the second call (within the
while loop) throws an exception (like EOFException), it basically destroys the work done up
until then. Therefore, some rows would never be replicated.

The patch attached here makes the exception handling so that if there were a exception in
the second time, the method would just return (thereby allowing the present call to readAllEntriesToReplicateOrNextFile
proceed normally). The following call to readAllEntriesToReplicateOrNextFile would actually
throw the exception.

With this patch, I stopped noticing the failures similar to http://bit.ly/RDdmPg. 

However, I do see some other failures and that I am still debugging (and that's why I renamed
this issue to Part-1!)
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
> ---------------------------------------------------------------------------
>                 Key: HBASE-6649
>                 URL: https://issues.apache.org/jira/browse/HBASE-6649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>             Fix For: 0.92.3
>         Attachments: 6649-1.patch, 6649-2.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html,
HBase-0.92 #502 test - queueFailover [Jenkins].html
> Have seen it twice in the recent past: http://bit.ly/MPCykB & http://bit.ly/O79Dq7
> Looking briefly at the logs hints at a pattern - in both the failed test instances, there
was an RS crash while the test was running.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message