hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Strange HBase failure
Date Mon, 12 Jan 2015 03:38:00 GMT
Serega:
Was the snippet of log from NODE01 ? Looks like NODE01 may have been under
heavy load - considering the number of daemons running on that node.

Please check GC log.

Cheers

On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <linshuai2012@gmail.com> wrote:

> From the log I see no log was produced during 08:25 to 09:15, why did this
> happen?
>
> 08:25:06.274INFOorg.apache.
> hadoop.hbase.regionserver.wal.HLog
>
> moving old hlog file
>
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> whose highest sequenceid is 229359 to
>
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
>
> 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Regards,
> Shuai
>
> On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
> > Hi, I have PoC HBase cluster running on 3 VM
> > deployment schema is:
> > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
> > NODE02 RegionServer, DN
> > NODE03 RegionServer, DN
> >
> > Suddenly ONLY HBase went offline, all services: HM RS
> > HDFS was working, no alerts were there
> > ZK server was working, no alerts there.
> > VMWare didn't publish any alerts.
> > Only restart of HBase service helped.
> >
> > We are using this:
> > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > hbase-0.94.15+113
> >
> > I made a deep dive into logs and found this stuff:
> > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > after a delay of 3026
> >
> > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > ROW, CompoundBloomFilterWriter
> >
> > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > CompoundBloomFilterWriter
> >
> > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> >
> > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Added
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > entries=8, sequenceid=229362, filesize=2.7 K
> >
> > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > in 80ms, sequenceid=229362, compaction requested=true
> >
> > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Starting compaction on CF in region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> >
> > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Starting compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > seqid=229362, totalSize=76.6 M
> >
> > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > ROW, CompoundBloomFilterWriter
> >
> > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > CompoundBloomFilterWriter
> >
> > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> >
> > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8bf8e92031834676b5d40b352120c5f2
> >
> > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Renaming compacted file at
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > to
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> >
> > 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8bf8e92031834676b5d40b352120c5f2
> >
> > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Completed major compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > store is 76.6 M
> >
> > 08:15:59.283INFO
> > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> >
> > completed compaction:
> >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > time=8079772165508216; duration=4sec
> >
> > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> >
> > Roll
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > entries=3, filesize=7313.  for
> >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> >
> > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> >
> > moving old hlog file
> >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > whose highest sequenceid is 229359 to
> >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> >
> > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> > delay of 14844
> >
> > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> > delay of 21848
> >
> > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > ROW, CompoundBloomFilterWriter
> >
> > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > CompoundBloomFilterWriter
> >
> > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> >
> > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> >
> > 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > a148cdce49964965b7b6f4728881427d
> >
> > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > a148cdce49964965b7b6f4728881427d
> >
> > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Added
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > entries=16, sequenceid=229369, filesize=4.5 K
> >
> > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
> > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > 90ms, sequenceid=229369, compaction requested=false
> >
> > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> >
> > Client session timed out, have not heard from server in 40003ms for
> > sessionid 0x1493f19431b0005, closing socket connection and attempting
> > reconnect
> >
> > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> >
> > Client session timed out, have not heard from server in 40020ms for
> > sessionid 0x1493f19431b0000, closing socket connection and attempting
> > reconnect
> >
> > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> >
> > Opening socket connection to server
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > authenticate using SASL (java.lang.SecurityException: Unable to locate
> > a login configuration)
> >
> > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> >
> > Socket connection established to
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> >
> > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> >
> > Opening socket connection to server
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > authenticate using SASL (java.lang.SecurityException: Unable to locate
> > a login configuration)
> >
> > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> >
> > Socket connection established to
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> >
> > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> >
> > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > has expired, closing socket connection
> >
> > 09:17:20.582INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> >
> > This client just lost it's session with ZooKeeper, will automatically
> > reconnect when needed.
> >
> > 09:17:20.582INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> >
> > ZK session expired. This disconnect could have been caused by a
> > network partition or a long-running GC pause, either way it's
> > recommended that you verify your environment.
> >
> > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> >
> > EventThread shut down
> >
> > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> >
> > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > has expired, closing socket connection
> >
> > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > ABORTING region server
> > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > regionserver:60020-0x1493f19431b0000
> > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> >         at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> >         at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> >
> > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > RegionServer abort: loaded coprocessors are: []
> >
> > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > readRequestsCount=279141, writeRequestsCount=91793,
> > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > blockCacheCount=7, blockCacheHitCount=596347,
> > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > fsReadLatencyHistogramMean=330098.85,
> > fsReadLatencyHistogramCount=2970773.00,
> > fsReadLatencyHistogramMedian=312261.50,
> > fsReadLatencyHistogram75th=357708.50,
> > fsReadLatencyHistogram95th=3147993.00,
> > fsReadLatencyHistogram99th=4431904.00,
> > fsReadLatencyHistogram999th=4431904.00,
> > fsPreadLatencyHistogramMean=3245622.44,
> > fsPreadLatencyHistogramCount=36742.00,
> > fsPreadLatencyHistogramMedian=3180486.50,
> > fsPreadLatencyHistogram75th=3733068.50,
> > fsPreadLatencyHistogram95th=6424537.75,
> > fsPreadLatencyHistogram99th=24206578.00,
> > fsPreadLatencyHistogram999th=24206578.00,
> > fsWriteLatencyHistogramMean=356278.05,
> > fsWriteLatencyHistogramCount=2983888.00,
> > fsWriteLatencyHistogramMedian=538147.00,
> > fsWriteLatencyHistogram75th=570668.00,
> > fsWriteLatencyHistogram95th=570668.00,
> > fsWriteLatencyHistogram99th=570668.00,
> > fsWriteLatencyHistogram999th=570668.00
> >
> > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > STOPPED: regionserver:60020-0x1493f19431b0000
> > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > aborting
> >
> > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> >
> > EventThread shut down
> >
> > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping server on 60020
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 0 on 60020: exiting
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 1 on 60020: exiting
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 0 on 60020: exiting
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping IPC Server listener on 60020
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 3 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 8 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 4 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 3 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > HBaseServer
> >
> > PRI IPC Server handler 6 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 2 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 6 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 5 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 7 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 2 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> >
> > Sending interrupt to stop the worker thread
> >
> > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Stopping infoServer
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 4 on 60020: exiting
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping IPC Server Responder
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 7 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 8 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 1 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 9 on 60020: exiting
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 5 on 60020: exiting
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping IPC Server Responder
> >
> > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 9 on 60020: exiting
> >
> > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> >
> > SplitLogWorker interrupted while waiting for task, exiting:
> > java.lang.InterruptedException
> >
> > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > REPL IPC Server handler 0 on 60020: exiting
> >
> > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> >
> > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372 exiting
> >
> > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > REPL IPC Server handler 2 on 60020: exiting
> >
> > 09:17:20.768INFOorg.mortbay.log
> >
> > Stopped SelectChannelConnector@0.0.0.0:60030
> >
> > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > REPL IPC Server handler 1 on 60020: exiting
> >
> > 09:17:20.908INFO
> > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> >
> > Stopping RegionServerSnapshotManager abruptly.
> >
> > 09:17:20.908INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> >
> > regionserver60020.compactionChecker exiting
> >
> > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> >
> > LogRoller exiting.
> >
> > 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> >
> > regionserver60020.cacheFlusher exiting
> >
> > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Waiting on 3 regions to close
> >
> > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Closed CF
> >
> > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> >
> > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Closed info
> >
> > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Closed .META.,,1.1028785192
> >
> > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Closed CF
> >
> > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Memstore size is 13376
> >
> > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Closed
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> >
> > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all
> > regions closed.
> >
> > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> >
> > regionserver60020.logSyncer exiting
> >
> > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020 closing leases
> >
> > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020 closed leases
> >
> > 09:17:22.021INFO
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> >
> > regionserver60020.periodicFlusher exiting
> >
> > 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> >
> > Sleeping 2000ms before retry #1...
> >
> > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020.leaseChecker closing leases
> >
> > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020.leaseChecker closed leases
> >
> > 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> >
> > Sleeping 4000ms before retry #2...
> >
> > 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> >
> > Sleeping 8000ms before retry #3...
> >
> > 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > ZooKeeper delete failed after 3 retries
> >
> > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Failed deleting my ephemeral node
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >         at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> >         at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> >         at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> >         at java.lang.Thread.run(Thread.java:662)
> >
> > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > zookeeper connection closed.
> >
> > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020 exiting
> >
> > 09:17:36.051ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> >
> > Region server exiting
> > java.lang.RuntimeException: HRegionServer Aborted
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >         at
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> >
> > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> >
> > Shutdown hook starting; hbase.shutdown.hook=true;
> >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> >
> > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > STOPPED: Shutdown hook
> >
> > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> >
> > Starting fs shutdown hook thread.
> >
> > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> >
> > Shutdown hook finished.
> >
> > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> >
> > HBase 0.94.15-cdh4.7.0
> >
> > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> >
> > Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > -r Unknown
> >
> > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> >
> > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> >
> > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> >
> > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > Inc., vmVersion=20.6-b01
> >
> > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> >
> > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > -Dhbase.security.logger=INFO,DRFAS]
> >
> > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Starting Thread-2
> >
> > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> >
> > Starting Thread-2
> >
> >
> > this FATAL
> >
> > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > ABORTING region server
> > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > regionserver:60020-0x1493f19431b0000
> > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> >         at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> >         at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> >
> > found on each RS
> > I didn' t see any problems with ZK or memory usage. The load is extremely
> > low, just few put/get in a second, you can see file sizes in logs.
> > What it could be how can i catch the root cause?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message