hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudharsan Sampath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3331) Kill -STOP of RS hosting META does not recover
Date Thu, 25 Aug 2011 07:30:29 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090844#comment-13090844
] 

Sudharsan Sampath commented on HBASE-3331:
------------------------------------------

I am facing this issue in 0.90.1 version. I have two servers in my test environment with one
server hosting both master and regionserver and the other only regionserver. HBase manages
the ZK. The quorum contains both these servers. Both the ROOT and the META regions are on
one of my region server. If that regionserver is stopped/killed the master web page does not
come up and throws Connection Refused on attempting to conatct the region server. The master
server logs seems to be more related to the ROOT region though. Should I open a new issue?



2011-08-25 12:50:23,531 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
locateRegionInMeta parentTable=-ROOT-, metaLocation=address: <<server>>:60020,
regioninfo: -ROOT-,,0.70236052, attempt=8 of 10 failed; retrying after sleep of 16000 because:
Connection refused
2011-08-25 12:50:23,531 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62135133;
hsa=<<server>>:60020
2011-08-25 12:50:39,531 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62135133;
hsa=<<server>>:60020
2011-08-25 12:50:39,532 WARN org.mortbay.log: /master.jsp: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server null for region , row '', but failed after 10 attempts.
Exceptions:
java.net.ConnectException: Connection refused


> Kill -STOP of RS hosting META does not recover
> ----------------------------------------------
>
>                 Key: HBASE-3331
>                 URL: https://issues.apache.org/jira/browse/HBASE-3331
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: timeouts.log.txt
>
>
> If you find the server hosting META and kill -STOP its region server, it will eventually
lose its ZK session and the master will split its logs and try to reassign. However, at some
point along here it tries to access the old META, and gets SocketTimeoutExceptions, which
cause it to keep retrying forever. Once I kill -9ed the stopped server, things came back to
life.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message