hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
Date Wed, 19 Sep 2012 21:47:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459137#comment-13459137
] 

Jean-Daniel Cryans commented on HBASE-6649:
-------------------------------------------

The server that has the patch did a "Break on IOE" twice, and it seems to work:

{noformat}
2012-09-19 21:26:50,104 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Opening log for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21992487
2012-09-19 21:26:50,110 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Break on IOE: hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722,
entryStart=21993911, pos=22058496, end=22058496, edit=5
2012-09-19 21:26:50,110 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
currentNbOperations:783007 and seenEntries:5 and size: 64585
2012-09-19 21:26:50,110 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replicating 5
2012-09-19 21:26:50,119 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Going to report log #va1r6s44%2C10304%2C1348088378534.1348089931722 for position 21993911
in hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722
2012-09-19 21:26:50,129 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
Removing 0 logs in the list: []
2012-09-19 21:26:50,129 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Replicated in total: 145502
2012-09-19 21:26:50,129 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Opening log for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21993911
{noformat}

One thing that I saw that this patch breaks is the size in "currentNbOperations:783007 and
seenEntries:5 and size: 64585" because it relies on this.position being the position at the
beginning. I often see that number at 0 while having edits to replicate. It's minor since
in HBASE-6804 I'm removing that log message altogether but we may want to either remove the
size or keep track of what it is at the beginning of the loop within the context of this jira.
                
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-6649
>                 URL: https://issues.apache.org/jira/browse/HBASE-6649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.96.0, 0.92.3, 0.94.2
>
>         Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling.patch,
6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html,
HBase-0.92 #502 test - queueFailover [Jenkins].html
>
>
> Have seen it twice in the recent past: http://bit.ly/MPCykB & http://bit.ly/O79Dq7
.. 
> Looking briefly at the logs hints at a pattern - in both the failed test instances, there
was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message