hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-10) HRegionServer hangs upon exit due to DFSClient Exception
Date Wed, 16 Apr 2008 20:35:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589725#action_12589725
] 

Jim Kellerman commented on HBASE-10:
------------------------------------

It would be *nice* if when bugs are entered, the environment field was filled in with at least
the hadoop version number.

For this issue, the hadoop version was 0.16.1 or earlier. This can be determined from the
message:
{code}
2008-01-02 20:39:20,552 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node: java.io.IOException: No live nodes contain current block
{code}
Note how DFSClient misidentifies itself as 'org.apache.hadoop.fs.DFSClient'. DFSClient has
always lived in org.apache.hadoop.dfs at least for any release since Q4 2007. The reason it
misidentifies itself is due to the way it gets its logger in 0.16.1 and earlier:
{code}
public static final Log LOG = LogFactory.getLog("org.apache.hadoop.fs.DFSClient");
{code}
In 0.16.2, this was changed to:
{code}
public static final Log LOG = LogFactory.getLog(DFSClient.class);
{code}

The message itself originates from DFSClient$DFSInputStream.chooseDataNode. This method will
try 3 times to get the block. It is called from both DFSClient$DFSInputStream.blockSeekTo
and DFSClient$DFSInputStream.fetchBlockByteRange. 

blockSeekTo does not do any retries, however it is called by DFSClient$DFSInputStream.read
which will make a total of 3 attempts at calling blockSeekTo, so a single call to read will
generate 9 messages if the block cannot be found.

fetchBlockByteRange will retry calling chooseDataNode for each known block location. If the
replication factor is 3, then it would normally call chooseDataNode 3 times. But suppose the
namenode was trying to replicate the block. This could result in the number of possible block
locations being greater than three. So at a minimum, chooseDataNode will generate 9 messages
when called from fetchBlockByteRange, and will add another 3 messages for each additional
entry in the block location list.

If multiple threads were doing reads, the number of messages generated could be seemingly
endless.


> HRegionServer hangs upon exit due to DFSClient Exception
> --------------------------------------------------------
>
>                 Key: HBASE-10
>                 URL: https://issues.apache.org/jira/browse/HBASE-10
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: CentOS 5
>            Reporter: Chris Kline
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> Several HRegionServers hang around indefinitely well after the HMaster has exited.  This
was triggered executing $HBASE_HOME/bin/stop-hbase.sh.  The HMaster exists fine, but here
is what happens on one of the HRegionServers:
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.HRegionServer: Got regionserver
stop message
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020
closing leases
> 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases$LeaseMonitor: regionserver/0.0.0.0:60020.leaseChecker
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020
closed leases
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: Stopping server on 60020
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020:
exiting
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020:
exiting
> 2008-01-02 18:54:01,909 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener
on 60020
> 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020:
exiting
> 2008-01-02 18:54:01,909 INFO org.apache.hadoop.hbase.HRegionServer: Stopping infoServer
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.Container: Stopping org.mortbay.jetty.Server@62c09554
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: closing ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: IGNORED
> java.net.SocketException: Socket closed
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
>         at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>         at java.net.ServerSocket.accept(ServerSocket.java:421)
>         at org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432)
>         at org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631)
> 2008-01-02 18:54:01,910 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
> 2008-01-02 18:54:01,910 DEBUG org.mortbay.util.ThreadedServer: Self connect to close
listener /127.0.0.1:60030
> 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem stopping acceptor
/127.0.0.1:
> 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem stopping acceptor
/127.0.0.1:
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>         at java.net.Socket.connect(Socket.java:519)
>         at java.net.Socket.connect(Socket.java:469)
>         at java.net.Socket.<init>(Socket.java:366)
>         at java.net.Socket.<init>(Socket.java:209)
>         at org.mortbay.util.ThreadedServer$Acceptor.forceStop(ThreadedServer.java:682)
>         at org.mortbay.util.ThreadedServer.stop(ThreadedServer.java:557)
>         at org.mortbay.http.SocketListener.stop(SocketListener.java:211)
>         at org.mortbay.http.HttpServer.doStop(HttpServer.java:781)
>         at org.mortbay.util.Container.stop(Container.java:154)
>         at org.apache.hadoop.hbase.util.InfoServer.stop(InfoServer.java:237)
>         at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:835)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-01-02 18:54:01,911 INFO org.mortbay.http.SocketListener: Stopped SocketListener
on 0.0.0.0:60030
> 2008-01-02 18:54:01,912 DEBUG org.mortbay.util.Container: Stopping HttpContext[/static,/static]
> 2008-01-02 18:54:01,912 DEBUG org.mortbay.http.handler.AbstractHttpHandler: Stopped org.mortbay.http.handler.ResourceHandler
in HttpContext[/static,/static]
> 2008-01-02 18:54:02,039 INFO org.mortbay.util.Container: Stopped HttpContext[/static,/static]
> 2008-01-02 18:54:02,039 DEBUG org.mortbay.util.Container: Stopping HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,039 DEBUG org.mortbay.http.handler.AbstractHttpHandler: Stopped org.mortbay.http.handler.ResourceHandler
in HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,154 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs]
> 2008-01-02 18:54:02,154 DEBUG org.mortbay.util.Container: Stopping WebApplicationContext[/,/]
> 2008-01-02 18:54:02,154 DEBUG org.mortbay.util.Container: Stopping org.mortbay.jetty.servlet.WebApplicationHandler@7ec5495e
> 2008-01-02 18:54:02,155 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@7ec5495e
> 2008-01-02 18:54:02,277 DEBUG org.mortbay.jetty.servlet.AbstractSessionManager: Session
scavenger exited
> 2008-01-02 18:54:02,278 DEBUG org.mortbay.util.Container: remove component: org.mortbay.jetty.servlet.WebApplicationHandler@7ec5495e
> 2008-01-02 18:54:02,278 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/]
> 2008-01-02 18:54:02,278 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.Server@62c09554
> 2008-01-02 18:54:02,278 DEBUG org.apache.hadoop.hbase.HRegionServer: closing region spider_pages,10_131455761,1198140179439
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer: regionserver/0.0.0.0:60020.cacheFlusher
exiting
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer: regionserver/0.0.0.0:60020.compactor
exiting
> 2008-01-02 18:54:02,278 INFO org.apache.hadoop.hbase.HRegionServer: regionserver/0.0.0.0:60020.splitter
exiting
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_131455761,1198140179439/search
(1501227429/search)
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_131455761,1198140179439/profile
(1501227429/profile)
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_131455761,1198140179439/meta
(1501227429/meta)
> 2008-01-02 18:54:02,279 INFO org.apache.hadoop.hbase.HRegion: closed spider_pages,10_131455761,1198140179439
> 2008-01-02 18:54:02,279 DEBUG org.apache.hadoop.hbase.HRegionServer: closing region spider_pages,10_486594261,1198319654267
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_486594261,1198319654267/search
(364081590/search)
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_486594261,1198319654267/profile
(364081590/profile)
> 2008-01-02 18:54:02,280 DEBUG org.apache.hadoop.hbase.HStore: closed spider_pages,10_486594261,1198319654267/meta
(364081590/meta)
> 2008-01-02 18:54:02,280 INFO org.apache.hadoop.hbase.HRegion: closed spider_pages,10_486594261,1198319654267
> ...
> ... this closing of regions goes on for a while
> ...
> ... the following continues until a kill -9
> ...
> 2008-01-02 20:39:20,552 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:40:23,556 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:41:26,560 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:42:29,566 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> 2008-01-02 20:43:32,571 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_5124700261538503923
from any node:  java.io.IOException: No live nodes contain current block
> ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message