hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2343) [hbase] Stuck regionserver?
Date Wed, 12 Dec 2007 00:39:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550767
] 

stack commented on HADOOP-2343:
-------------------------------

Another one of these happened over on pauls' cluster w/ TRUNK from about a day ago.  He has
configured to run w/ 40 threads per so I'm guessing its not likely that its lack of allocated
threads (could be though):

{code}
...
2007-12-11 15:39:15,597 DEBUG hbase.HLog - Closing current log writer /hbase/log_XX.XX.XX.16_1197365632138_60020/hlog.dat.2047
to get a new one
2007-12-11 15:39:15,611 INFO  hbase.HLog - new log writer created at /hbase/log_XX.XX.XX.16_1197365632138_60020/hlog.dat.2048
2007-12-11 15:39:15,611 DEBUG hbase.HLog - Found 3 logs to remove using oldest outstanding
seqnum of 106741610 from region postlog,img141/6876/angjol7qx.jpg,1197403515753
2007-12-11 15:39:15,612 INFO  hbase.HLog - removing old log file /hbase/log_XX.XX.XX.16_1197365632138_60020/hlog.dat.2044
whose highest sequence/edit id is 106644872
2007-12-11 15:39:15,616 INFO  hbase.HLog - removing old log file /hbase/log_XX.XX.XX.16_1197365632138_60020/hlog.dat.2045
whose highest sequence/edit id is 106674877
2007-12-11 15:39:15,621 INFO  hbase.HLog - removing old log file /hbase/log_XX.XX.XX.16_1197365632138_60020/hlog.dat.2046
whose highest sequence/edit id is 106731580
2007-12-11 15:53:53,090 DEBUG hbase.HRegion - Started memcache flush for region postlog,img212/6231/yoturco8lb.jpg,1197410959126.
Size 96.5k
2007-12-11 15:53:53,407 FATAL hbase.HRegionServer - unable to report to master for 858080
milliseconds - aborting server
2007-12-11 15:53:53,407 INFO  hbase.Leases - regionserver/0:0:0:0:0:0:0:0:60020 closing leases
2007-12-11 15:53:53,652 WARN  ipc.Server - IPC Server handler 32 on 60020, call batchUpdate(postlog,img211/363/15171222365f2bc22xh.jpg,1197410959123,
1195466232000, org.apache.hadoop.hbase.io.BatchUpdate@7659bf1) from 38.99.77.106:35490: output
error
java.nio.channels.ClosedChannelException
    at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
    at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
    at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
    at java.io.DataOutputStream.flush(DataOutputStream.java:106)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:663)
{code}

> [hbase] Stuck regionserver?
> ---------------------------
>
>                 Key: HADOOP-2343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2343
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>
> Looking in logs, a regionserver went down because it could not contact the master after
60 seconds.  Watching logging, the HRS is repeatedly checking all 150 loaded regions over
and over again w/ a pause of about 5 seconds between runs... then there is a suspicious 60+
second gap with no logging as though the regionserver had hung up on something:
> {code}
> 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
> 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635:
snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region postlog,img247/230/seanpaul4li.jpg,1196615889965
> 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region postlog,img247/230/seanpaul4li.jpg,1196615889965:
snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to master for 67467
milliseconds - aborting server
> 2007-12-03 13:16:04,455 INFO  hbase.Leases - regionserver/0:0:0:0:0:0:0:0:60020 closing
leases
> 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker
exiting
> {code}
> Master seems to be running fine scanning its ~700 regions.  Then you see this in log,
before the HRS shuts itself down.
> {code}
> 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease expired 153260899/1532608992007-12-03
13:14:31,417 INFO  hbase.HMaster - XX.XX.XX.102:60020 lease expired
> {code}
> ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message