hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject "Error recovery for block... failed because recovery from primary datanode failed 6 times"
Date Mon, 14 Feb 2011 06:58:46 GMT
Hey guys,

I'm occasionally getting regionservers going down (running a late RC
of .89 that Ryan built). 5x c2.xlarge nodes (8gb/6 cores?) on EC2 with
EBS drives.

Here's the error message from the RS log. Hadoop fsck shows it's fine.

Any ideas?


2011-02-14 01:51:51,715 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
mobile4-2011021,20110122:37b16319-58e8-4809-bca6-83d7598a41dd:E84F9612-CE1A-4FE1-AAE9-2A7AF8C9B2F1:21519,1297657239532.d15ce98030138cad79e248e0845b70ee.
2011-02-14 01:51:51,715 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server
at: ip-10-243-106-63.ec2.internal,60020,1297656774012
2011-02-14 01:51:51,711 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver60020.majorCompactionChecker exiting
2011-02-14 01:51:51,856 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12e225ef5640002 closed
2011-02-14 01:51:51,856 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
<ip-10-204-213-153.ec2.internal:/hbase,ip-10-243-106-63.ec2.internal,60020,1297656773719>Closed
connection with ZooKeeper; /hbase/root-region-server
2011-02-14 01:51:58,706 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread
exiting
2011-02-14 01:51:58,706 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
exiting
2011-02-14 01:52:00,031 INFO org.apache.hadoop.hbase.Leases:
regionserver60020.leaseChecker closing leases
2011-02-14 01:52:00,031 INFO org.apache.hadoop.hbase.Leases:
regionserver60020.leaseChecker closed leases
2011-02-14 01:52:00,033 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-10,5,main]
2011-02-14 01:52:00,033 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs
shutdown hook thread.
2011-02-14 01:52:00,036 ERROR org.apache.hadoop.hdfs.DFSClient:
Exception closing file
/hbase-entest/.logs/ip-10-243-106-63.ec2.internal,60020,1297656774012/10.243.106.63%3A60020.1297660376363
: java.io.IOException: IOException flush:java.io.IOException:
IOException flush:java.io.IOException: IOException
flush:java.io.IOException: Error Recovery for block
blk_208685344091455182_10263 failed  because recovery from primary
datanode 10.243.106.63:50010 failed 6 times.  Pipeline was
10.243.106.63:50010. Aborting...
java.io.IOException: IOException flush:java.io.IOException:
IOException flush:java.io.IOException: IOException
flush:java.io.IOException: Error Recovery for block
blk_208685344091455182_10263 failed  because recovery from primary
datanode 10.243.106.63:50010 failed 6 times.  Pipeline was
10.243.106.63:50010. Aborting...
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3214)
	at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
	at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:123)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:906)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.completeCacheFlush(HLog.java:1078)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:943)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:834)
	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:786)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:250)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:224)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
2011-02-14 01:52:00,076 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
finished.
2011-02-14 01:52:00,139 WARN
org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher: No
longer connected to ZooKeeper, current state: Disconnected


-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Mime
View raw message