hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prakash Khemani (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-3814) force regionserver to halt
Date Fri, 22 Apr 2011 18:53:05 GMT
force regionserver to halt
--------------------------

                 Key: HBASE-3814
                 URL: https://issues.apache.org/jira/browse/HBASE-3814
             Project: HBase
          Issue Type: Bug
            Reporter: Prakash Khemani


Once abort() on a regionserver is called we should have a timeout thread that does Runtime.halt()
if the rs gets stuck somewhere during abort processing.

===


Pumahbase132 has following the logs .. the dfsclient is not able to set up a write pipeline
successfully ... it tries to abort ... but while aborting it gets stuck. I know there is a
check that if we are aborting because filesystem is closed then we should not try to flush
the logs while aborting. But in this case the fs is up and running, just that it is not functioning.

2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
10.38.131.53:50010  for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
Bad connect ack with firstBadLink 10.38.133.33:50010
2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-8967376451767492285_6537229
for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
10.38.131.53:50010  for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
Bad connect ack with firstBadLink 10.38.134.59:50010
2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7172251852699100447_6537229
for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280


2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
10.38.131.53:50010  for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
Bad connect ack with firstBadLink 10.38.134.53:50010
2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-9153204772467623625_6537229
for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
10.38.131.53:50010  for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
Bad connect ack with firstBadLink 10.38.134.49:50010
2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2513098940934276625_6537229
for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException:
Unable to create new block.
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977)

2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2513098940934276625_6537229
bad datanode[1] nodes == null
2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations.
Source file "/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280"
- Aborting...
2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append.
Requesting close of hlog

And then the RS gets stuck trying to roll the logs ...



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message