hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-10) HRegionServer hangs upon exit due to DFSClient Exception
Date Wed, 16 Apr 2008 22:45:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589758#action_12589758
] 

Jim Kellerman commented on HBASE-10:
------------------------------------

Proposal:

The only time any of the flusher, compactor or log roller threads needs to be interrupted
is if it is dormant waiting for something to do. Interrupting one of these threads at any
other time will probably produce undesirable results like the DFSClient issue. If the thread
is not dormant, it will finish what it is doing, go to the top of the loop and exit because
the stop flag is set.

So instead of using synchronized, use a ReentrantLock. The interruptor will use tryLock and
if it fails, it knows the thread is working and does not need to be interrupted because it
is doing something and when it finishes it checks the stop flag and exits.

If the tryLock succeeds, it interrupts the thread to wake it from its sleep.

Nothing else needs to be done because the join at the end of HRegionServer.run will take care
of coordinating the region server exit.


> HRegionServer hangs upon exit due to DFSClient Exception
> --------------------------------------------------------
>
>                 Key: HBASE-10
>                 URL: https://issues.apache.org/jira/browse/HBASE-10
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: CentOS 5
>            Reporter: Chris Kline
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> Several HRegionServers hang around indefinitely well after the HMaster has exited.  This
was triggered executing $HBASE_HOME/bin/stop-hbase.sh.  The HMaster exists fine, but here
is what happens on one of the HRegionServers:
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.HRegionServer: Got regionserver
stop message
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020
closing leases
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases$LeaseMonitor: regionserver/0.0.0.0:60020.leaseChecker
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020
closed leases
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: Stopping server on 60020
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020:
exiting
> 2008-01-02 18:54:01,909 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener
on 60020
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020:
exiting
> 2008-01-02 18:54:01,909 INFO org.apache.hadoop.hbase.HRegionServer: Stopping infoServer
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.Container: Stopping org.mortbay.jetty.Server@62c09554
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: closing ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: IGNORED
> java.net.SocketException: Socket closed
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
>         at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>         at java.net.ServerSocket.accept(ServerSocket.java:421)
>         at org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432)
>         at org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631)
> 2008-01-02 18:54:01,910 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-01-02 18:54:01,910 DEBUG org.mortbay.util.ThreadedServer: Self connect to close
listener /127.0.0.1:60030
> 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem stopping acceptor
/127.0.0.1:
> 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem stopping acceptor
/127.0.0.1:
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>         at java.net.Socket.connect(Socket.java:519)
>         at java.net.Socket.connect(Socket.java:469)
>         at java.net.Socket.<init>(Socket.java:366)
>         at java.net.Socket.<init>(Socket.java:209)
>         at org.mortbay.util.ThreadedServer$Acceptor.forceStop(ThreadedServer.java:682)
>         at org.mortbay.util.ThreadedServer.stop(ThreadedServer.java:557)
>         at org.mortbay.http.SocketListener.stop(SocketListener.java:211)
>         at org.mortbay.http.HttpServer.doStop(HttpServer.java:781)
>         at org.mortbay.util.Container.stop(Container.java:154)
>         at org.apache.hadoop.hbase.util.InfoServer.stop(InfoServer.java:237)
>         at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:835)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-01-02 18:54:01,911 INFO org.mortbay.http.SocketListener: Stopped SocketListener
on 0.0.0.0:60030
> 2008-01-02 18:54:01,912 DEBUG org.mortbay.util.Container: Stopping HttpContext[/static,/static]
> 2008-01-02 18:54:01,912 DEBUG org.mortbay.http.handler.AbstractHttpHandler: Stopped org.mortbay.http.handler.ResourceHandler
in HttpContext[/static,/static]
> 2008-01-02 18:54:02,039 INFO org.mortbay.util.Container: Stopped HttpContext[/static,/static]
> 2008-01-02 18:54:02,039 DEBUG org.mortbay.util.Container: Stopping HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,039 DEBUG org.mortbay.http.handler.AbstractHttpHandler: Stopped org.mortbay.http.handler.ResourceHandler
in HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,154 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,154 DEBUG org.mortbay.util.Container: Stopping WebApplicationContext[/,/]
> 2008-01-02 18:54:02,154 DEBUG org.mortbay.util.Container: Stopping org.mortbay.jetty.servlet.WebApplicationHandler@7ec5495e
> 2008-01-02 18:54:02,155 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@7ec5495e
> 2008-01-02 18:54:02,277 DEBUG org.mortbay.jetty.servlet.AbstractSessionManager: Session
scavenger exited
> 2008-01-02 18:54:02,278 DEBUG org.mortbay.util.Container: remove component: org.mortbay.jetty.servlet.WebApplicationHandler@7ec5495e
> 2008-01-02 18:54:02,278 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/]
> 2008-01-02 18:54:02,278 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.Server@62c09554
> 2008-01-02 18:54:02,278 DEBUG org.apache.hadoop.hbase.HRegionServer: closing region spider_pages,10_131455761,1198140179439
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer: regionserver/0.0.0.0:60020.cacheFlusher
exiting
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer: regionserver/0.0.0.0:60020.compactor
exiting
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer: regionserver/0.0.0.0:60020.splitter
exiting
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_131455761,1198140179439/search
(1501227429/search)
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_131455761,1198140179439/profile
(1501227429/profile)
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_131455761,1198140179439/meta
(1501227429/meta)
> 2008-01-02 18:54:02,279 INFO org.apache.hadoop.hbase.HRegion: closed spider_pages,10_131455761,1198140179439
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HRegionServer: closing region spider_pages,10_486594261,1198319654267
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_486594261,1198319654267/search
(364081590/search)
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_486594261,1198319654267/profile
(364081590/profile)
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_486594261,1198319654267/meta
(364081590/meta)
> 2008-01-02 18:54:02,280 INFO org.apache.hadoop.hbase.HRegion: closed spider_pages,10_486594261,1198319654267
> ...
> ... this closing of regions goes on for a while
> ...
> ... the following continues until a kill -9
> ...
> 2008-01-02 20:39:20,552 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:40:23,556 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:41:26,560 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:42:29,566 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:43:32,571 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message