hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xufeng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5828) TestLogRolling fails in 0.90 branch killing the test suite up on jenkins
Date Wed, 09 May 2012 02:15:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271013#comment-13271013
] 

xufeng commented on HBASE-5828:
-------------------------------

My Conclusion:
The the log writer pipeline of meta regionserver has been destroyed by testLogRollOnDatanodeDeath().
when operate the Hlog on meta regionserver it will failed.

I think this is not a bug just a test problem.Like the other versions,We scan segregate the
test cluster by @before and @after tag.


My My Analysis::
1.I check the log from https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/472/artifact/trunk/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestLogRolling-output.txt
Found the meta region and user region be deployed on different RS.It may cause the problem.
{noformat}
	Line 226: 2012-04-15 09:19:11,325 INFO  [MASTER_OPEN_REGION-hemera.apache.org:45266-0] handler.OpenedRegionHandler(137):
The master has opened the region -ROOT-,,0.70236052 that was online on serverName=hemera.apache.org,46368,1334481545408,
load=(requests=0, regions=0, usedHeap=71, maxHeap=1244)
	Line 288: 2012-04-15 09:19:11,379 INFO  [MASTER_OPEN_REGION-hemera.apache.org:45266-1] handler.OpenedRegionHandler(137):
The master has opened the region .META.,,1.1028785192 that was online on serverName=hemera.apache.org,46368,1334481545408,
load=(requests=0, regions=0, usedHeap=71, maxHeap=1244)
	Line 504: 2012-04-15 09:19:21,513 DEBUG [MASTER_OPEN_REGION-hemera.apache.org:45266-2] handler.OpenedRegionHandler(139):
The master has opened the region TestLogRolling,,1334481561377.369bae2d9411e5836a17df5e31b07b5e.
that was online on serverName=hemera.apache.org,46368,1334481545408, load=(requests=0, regions=2,
usedHeap=76, maxHeap=1244)
	Line 8652: 2012-04-15 09:20:09,099 DEBUG [MASTER_OPEN_REGION-hemera.apache.org:45266-3] handler.OpenedRegionHandler(139):
The master has opened the region TestLogRolling,,1334481609001.198321df4a9c90eebcf670a2218d29f0.
that was online on serverName=hemera.apache.org,34725,1334481545432, load=(requests=0, regions=0,
usedHeap=129, maxHeap=1244)
{noformat}

2.In this issue we should find in case where the problem happened. 
I reproduce this issue again´╝îthis time I added the debug code in testLogRollOnPipelineRestart().
    if (admin.tableExists(tableName)) {
      admin.disableTable(tableName);
      +LOG.info("xufeng7-------->before delete table!");
      admin.deleteTable(tableName);
      +LOG.info("xufeng7-------->after delete table!");
    }

3.The test be killed And I just found the "before delete table!" no "after delete table!"
string in log.
It can ensure that the prroblem happened in admin.deleteTable(tableName).
This is the region assigned info 
{noformat}
	Line 226: 2012-05-08 16:45:17,603 INFO  [MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-0] handler.OpenedRegionHandler(137):
The master has opened the region -ROOT-,,0.70236052 that was online on serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=0, usedHeap=49, maxHeap=1348)
	Line 288: 2012-05-08 16:45:17,696 INFO  [MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-1] handler.OpenedRegionHandler(137):
The master has opened the region .META.,,1.1028785192 that was online on serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=0, usedHeap=49, maxHeap=1348)
	Line 505: 2012-05-08 16:45:28,099 DEBUG [MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-2] handler.OpenedRegionHandler(139):
The master has opened the region TestLogRolling,,1336466727648.b119d8cad80bc39cdd38b04421a67280.
that was online on serverName=HADOOP-CI-AGENT-A,57227,1336466711078, load=(requests=0, regions=0,
usedHeap=54, maxHeap=1348)
	Line 8495: 2012-05-08 16:46:18,169 DEBUG [MASTER_OPEN_REGION-HADOOP-CI-AGENT-A:42383-3] handler.OpenedRegionHandler(139):
The master has opened the region TestLogRolling,,1336466777847.0117c81b3e59fdad15c4a4390156a558.
that was online on serverName=HADOOP-CI-AGENT-A,57227,1336466711078, load=(requests=0, regions=0,
usedHeap=68, maxHeap=1348)
{noformat}
This is my debug info:
{noformat}
	Line 8526: 2012-05-08 16:46:28,138 INFO  [main] wal.TestLogRolling(339): xufeng1-------->cluster.getRegionServer(0)serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=2, usedHeap=72, maxHeap=1348)
	Line 8527: 2012-05-08 16:46:28,138 INFO  [main] wal.TestLogRolling(344): xufeng1-------->127.0.0.1:36932
	Line 8528: 2012-05-08 16:46:28,138 INFO  [main] wal.TestLogRolling(345): xufeng1-------->127.0.0.1:42887
	Line 8529: 2012-05-08 16:46:28,138 INFO  [main] wal.TestLogRolling(351): xufeng2-------->cluster.getRegionServer(1)serverName=HADOOP-CI-AGENT-A,57227,1336466711078,
load=(requests=0, regions=1, usedHeap=72, maxHeap=1348)
	Line 8530: 2012-05-08 16:46:28,138 INFO  [main] wal.TestLogRolling(356): xufeng2-------->127.0.0.1:42887
	Line 8531: 2012-05-08 16:46:28,138 INFO  [main] wal.TestLogRolling(357): xufeng2-------->127.0.0.1:36932
	Line 9795: 2012-05-08 16:47:06,982 INFO  [main] wal.TestLogRolling(420): xufeng3-------->cluster.getRegionServer(0)serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=2, usedHeap=73, maxHeap=1348)
	Line 9796: 2012-05-08 16:47:06,982 INFO  [main] wal.TestLogRolling(425): xufeng3-------->127.0.0.1:36932
	Line 9797: 2012-05-08 16:47:06,982 INFO  [main] wal.TestLogRolling(426): xufeng3-------->127.0.0.1:42887
	Line 9798: 2012-05-08 16:47:06,983 INFO  [main] wal.TestLogRolling(432): xufeng4-------->cluster.getRegionServer(1)serverName=HADOOP-CI-AGENT-A,57227,1336466711078,
load=(requests=0, regions=1, usedHeap=73, maxHeap=1348)
	Line 9799: 2012-05-08 16:47:06,983 INFO  [main] wal.TestLogRolling(437): xufeng4-------->127.0.0.1:45998
	Line 9800: 2012-05-08 16:47:06,983 INFO  [main] wal.TestLogRolling(438): xufeng4-------->127.0.0.1:54936
	Line 9803: 2012-05-08 16:47:06,985 INFO  [main] wal.TestLogRolling(468): xufeng5-------->cluster.getRegionServer(0)serverName=HADOOP-CI-AGENT-A,49991,1336466711048,
load=(requests=0, regions=2, usedHeap=73, maxHeap=1348)
	Line 9804: 2012-05-08 16:47:06,985 INFO  [main] wal.TestLogRolling(473): xufeng5-------->127.0.0.1:36932
	Line 9805: 2012-05-08 16:47:06,985 INFO  [main] wal.TestLogRolling(474): xufeng5-------->127.0.0.1:42887
	Line 9806: 2012-05-08 16:47:06,985 INFO  [main] wal.TestLogRolling(480): xufeng6-------->cluster.getRegionServer(1)serverName=HADOOP-CI-AGENT-A,57227,1336466711078,
load=(requests=0, regions=1, usedHeap=73, maxHeap=1348)
	Line 9807: 2012-05-08 16:47:06,985 INFO  [main] wal.TestLogRolling(485): xufeng6-------->127.0.0.1:45998
	Line 9808: 2012-05-08 16:47:06,985 INFO  [main] wal.TestLogRolling(486): xufeng6-------->127.0.0.1:54936
	Line 9956: 2012-05-08 16:47:17,032 INFO  [main] wal.TestLogRolling(494): xufeng7-------->before
delete table!
{noformat}

4.What happened in admin.deleteTable(tableName),I checked the log,and found that 
{noformat}
2012-05-08 16:46:55,975 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #0 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:06,981 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #1 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:17,986 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #2 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:27,990 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #3 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:38,995 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #4 from primary datanode 127.0.0.1:42887
2012-05-08 16:47:49,999 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #5 from primary datanode 127.0.0.1:42887
2012-05-08 16:48:01,003 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #0 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:12,007 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #1 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:23,011 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #2 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:34,015 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #3 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:45,019 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #4 from primary datanode 127.0.0.1:36932
2012-05-08 16:48:56,023 WARN  [DataStreamer for file /user/root/.logs/HADOOP-CI-AGENT-A,49991,1336466711048/HADOOP-CI-AGENT-A%3A49991.1336466712229
block blk_2409401836437026689_1008] hdfs.DFSClient$DFSOutputStream(2707): Failed recovery
attempt #5 from primary datanode 127.0.0.1:36932
{noformat}

5.It will cause the Hlog writer to roll,and if deleta table failed,the HBaseAdmin#deleteTable
will cost long time in loop until Retries exhausted.
{noformat}
2012-05-08 16:48:56,025 DEBUG [RegionServer:0;HADOOP-CI-AGENT-A,49991,1336466711048.logRoller]
regionserver.LogRoller(90): HLog roll manually triggered
2012-05-08 16:48:56,027 ERROR [PRI IPC Server handler 8 on 49991] regionserver.HRegionServer(970):

java.io.IOException: Reflection
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1088)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1023)
	at org.apache.hadoop.hbase.regionserver.HRegion.delete(HRegion.java:1265)
	at org.apache.hadoop.hbase.regionserver.HRegion.delete(HRegion.java:1185)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.delete(HRegionServer.java:1950)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:602)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
	... 11 more
Caused by: java.io.IOException: DFSOutputStream is closed
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3238)
	at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
	at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
	... 15 more

{noformat}
                
> TestLogRolling fails in 0.90 branch killing the test suite up on jenkins
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5828
>                 URL: https://issues.apache.org/jira/browse/HBASE-5828
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>
> See recent 0.90 builds up on jenkins: https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/471/console

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message