hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Strange HBase failure
Date Sun, 11 Jan 2015 19:47:02 GMT
Hi, I have PoC HBase cluster running on 3 VM
deployment schema is:
NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
NODE02 RegionServer, DN
NODE03 RegionServer, DN

Suddenly ONLY HBase went offline, all services: HM RS
HDFS was working, no alerts were there
ZK server was working, no alerts there.
VMWare didn't publish any alerts.
Only restart of HBase service helped.

We are using this:
http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
hbase-0.94.15+113

I made a deep dive into logs and found this stuff:
08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

regionserver60020.periodicFlusher requesting flush for region
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
after a delay of 3026

08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile

Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
ROW, CompoundBloomFilterWriter

08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile

Delete Family Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
CompoundBloomFilterWriter

08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile

General Bloom and NO DeleteFamily was added to HFile
(hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)

08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store

Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb

08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader

Loaded ROW (CompoundBloomFilter) metadata for 8e68424066dc4c02a60ca57ec98128fb

08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader

Loaded ROW (CompoundBloomFilter) metadata for 8e68424066dc4c02a60ca57ec98128fb

08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store

Added hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
entries=8, sequenceid=229362, filesize=2.7 K

08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion

Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
in 80ms, sequenceid=229362, compaction requested=true

08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion

Starting compaction on CF in region
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.

08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store

Starting compaction of 4 file(s) in CF of
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
into tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
seqid=229362, totalSize=76.6 M

08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile

Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
ROW, CompoundBloomFilterWriter

08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile

Delete Family Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
CompoundBloomFilterWriter

08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile

General Bloom and NO DeleteFamily was added to HFile
(hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)

08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader

Loaded ROW (CompoundBloomFilter) metadata for 8bf8e92031834676b5d40b352120c5f2

08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store

Renaming compacted file at
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
to hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2

08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader

Loaded ROW (CompoundBloomFilter) metadata for 8bf8e92031834676b5d40b352120c5f2

08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store

Completed major compaction of 4 file(s) in CF of
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
store is 76.6 M

08:15:59.283INFO
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest

completed compaction:
regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
time=8079772165508216; duration=4sec

08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog

Roll /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
entries=3, filesize=7313.  for
/hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224

08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog

moving old hlog file
/hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
whose highest sequenceid is 229359 to
/hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020

09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

regionserver60020.periodicFlusher requesting flush for region
epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
delay of 14844

09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

regionserver60020.periodicFlusher requesting flush for region
epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
delay of 21848

09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile

Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
ROW, CompoundBloomFilterWriter

09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile

Delete Family Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
CompoundBloomFilterWriter

09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile

General Bloom and NO DeleteFamily was added to HFile
(hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)

09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store

Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d

09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader

Loaded ROW (CompoundBloomFilter) metadata for a148cdce49964965b7b6f4728881427d

09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader

Loaded ROW (CompoundBloomFilter) metadata for a148cdce49964965b7b6f4728881427d

09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store

Added hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
entries=16, sequenceid=229369, filesize=4.5 K

09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion

Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
90ms, sequenceid=229369, compaction requested=false

09:16:55.986INFOorg.apache.zookeeper.ClientCnxn

Client session timed out, have not heard from server in 40003ms for
sessionid 0x1493f19431b0005, closing socket connection and attempting
reconnect

09:16:56.552INFOorg.apache.zookeeper.ClientCnxn

Client session timed out, have not heard from server in 40020ms for
sessionid 0x1493f19431b0000, closing socket connection and attempting
reconnect

09:16:57.535INFOorg.apache.zookeeper.ClientCnxn

Opening socket connection to server
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
authenticate using SASL (java.lang.SecurityException: Unable to locate
a login configuration)

09:16:57.550INFOorg.apache.zookeeper.ClientCnxn

Socket connection established to
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session

09:16:58.448INFOorg.apache.zookeeper.ClientCnxn

Opening socket connection to server
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
authenticate using SASL (java.lang.SecurityException: Unable to locate
a login configuration)

09:16:58.449INFOorg.apache.zookeeper.ClientCnxn

Socket connection established to
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session

09:17:20.581INFOorg.apache.zookeeper.ClientCnxn

Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
has expired, closing socket connection

09:17:20.582INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation

This client just lost it's session with ZooKeeper, will automatically
reconnect when needed.

09:17:20.582INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation

ZK session expired. This disconnect could have been caused by a
network partition or a long-running GC pause, either way it's
recommended that you verify your environment.

09:17:20.582INFOorg.apache.zookeeper.ClientCnxn

EventThread shut down

09:17:20.584INFOorg.apache.zookeeper.ClientCnxn

Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
has expired, closing socket connection

09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer

ABORTING region server
etp-hdfs-n1-sg.passport.local,60020,1414102905372:
regionserver:60020-0x1493f19431b0000
regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer

RegionServer abort: loaded coprocessors are: []

09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
readRequestsCount=279141, writeRequestsCount=91793,
compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
blockCacheCount=7, blockCacheHitCount=596347,
blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
fsReadLatencyHistogramMean=330098.85,
fsReadLatencyHistogramCount=2970773.00,
fsReadLatencyHistogramMedian=312261.50,
fsReadLatencyHistogram75th=357708.50,
fsReadLatencyHistogram95th=3147993.00,
fsReadLatencyHistogram99th=4431904.00,
fsReadLatencyHistogram999th=4431904.00,
fsPreadLatencyHistogramMean=3245622.44,
fsPreadLatencyHistogramCount=36742.00,
fsPreadLatencyHistogramMedian=3180486.50,
fsPreadLatencyHistogram75th=3733068.50,
fsPreadLatencyHistogram95th=6424537.75,
fsPreadLatencyHistogram99th=24206578.00,
fsPreadLatencyHistogram999th=24206578.00,
fsWriteLatencyHistogramMean=356278.05,
fsWriteLatencyHistogramCount=2983888.00,
fsWriteLatencyHistogramMedian=538147.00,
fsWriteLatencyHistogram75th=570668.00,
fsWriteLatencyHistogram95th=570668.00,
fsWriteLatencyHistogram99th=570668.00,
fsWriteLatencyHistogram999th=570668.00

09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

STOPPED: regionserver:60020-0x1493f19431b0000
regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
aborting

09:17:20.750INFOorg.apache.zookeeper.ClientCnxn

EventThread shut down

09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer

Stopping server on 60020

09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 0 on 60020: exiting

09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 1 on 60020: exiting

09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 0 on 60020: exiting

09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer

Stopping IPC Server listener on 60020

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 3 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 8 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 4 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 3 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.
HBaseServer

PRI IPC Server handler 6 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 2 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 6 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 5 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 7 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 2 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker

Sending interrupt to stop the worker thread

09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

Stopping infoServer

09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 4 on 60020: exiting

09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer

Stopping IPC Server Responder

09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 7 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 8 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 1 on 60020: exiting

09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer

PRI IPC Server handler 9 on 60020: exiting

09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 5 on 60020: exiting

09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer

Stopping IPC Server Responder

09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer

IPC Server handler 9 on 60020: exiting

09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker

SplitLogWorker interrupted while waiting for task, exiting:
java.lang.InterruptedException

09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer

REPL IPC Server handler 0 on 60020: exiting

09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker

SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372 exiting

09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer

REPL IPC Server handler 2 on 60020: exiting

09:17:20.768INFOorg.mortbay.log

Stopped SelectChannelConnector@0.0.0.0:60030

09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer

REPL IPC Server handler 1 on 60020: exiting

09:17:20.908INFO
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager

Stopping RegionServerSnapshotManager abruptly.

09:17:20.908INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker

regionserver60020.compactionChecker exiting

09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller

LogRoller exiting.

09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher

regionserver60020.cacheFlusher exiting

09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372

09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

Waiting on 3 regions to close

09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store

Closed CF

09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion

Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.

09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store

Closed info

09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion

Closed .META.,,1.1028785192

09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store

Closed CF

09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion

Memstore size is 13376

09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion

Closed epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.

09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all
regions closed.

09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog

regionserver60020.logSyncer exiting

09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases

regionserver60020 closing leases

09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases

regionserver60020 closed leases

09:17:22.021INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher

regionserver60020.periodicFlusher exiting

09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper

Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372

09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter

Sleeping 2000ms before retry #1...

09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases

regionserver60020.leaseChecker closing leases

09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases

regionserver60020.leaseChecker closed leases

09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper

Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372

09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter

Sleeping 4000ms before retry #2...

09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper

Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372

09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter

Sleeping 8000ms before retry #3...

09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper

Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372

09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper

ZooKeeper delete failed after 3 retries

09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer

Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
	at java.lang.Thread.run(Thread.java:662)

09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
zookeeper connection closed.

09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

regionserver60020 exiting

09:17:36.051ERROR
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine

Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
	at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)

09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook

Shutdown hook starting; hbase.shutdown.hook=true;
fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a

09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

STOPPED: Shutdown hook

09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook

Starting fs shutdown hook thread.

09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook

Shutdown hook finished.

10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo

HBase 0.94.15-cdh4.7.0

10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo

Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
-r Unknown

10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo

Compiled by jenkins on Wed May 28 10:48:53 PDT 2014

10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine

vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
Inc., vmVersion=20.6-b01

10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine

vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
-Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
-XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
-XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
-XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
-Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
-Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
-Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
-Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
-Dhbase.security.logger=INFO,DRFAS]

10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer

Starting Thread-2

10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer

Starting Thread-2


this FATAL

09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer

ABORTING region server
etp-hdfs-n1-sg.passport.local,60020,1414102905372:
regionserver:60020-0x1493f19431b0000
regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

found on each RS
I didn' t see any problems with ZK or memory usage. The load is extremely
low, just few put/get in a second, you can see file sizes in logs.
What it could be how can i catch the root cause?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message