hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: Strange HBase failure
Date Mon, 12 Jan 2015 08:28:52 GMT
Hi Serega,

Do you have enough resources allocated for each VM? Just some swapping on
the VMs or the host can make things unstable. Also from the number of
services on each VM sounds like your host should have at least 12GB of free
RAM just for running things smoothly otherwise you might want to try with
less VMs and with some RAM each.

cheers,
esteban.



--
Cloudera, Inc.


On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak <serega.sheypak@gmail.com>
wrote:

> Hi, HBase was down during 08:25 to 09:15
> I was looking into logs, and thinking. I've tried to find something more
> clever. than dummy restart.
> We are using Cloudera distro, each of daemons run in it's own JVM.
> I'll try to find CPU load logs.
> There is really low load,
> Finished memstore flush of ~7.7 K/7840,
>
> Flushed , sequenceid=229369, memsize=16.3 K
>
>
> Completed major compaction of 4 file(s) in CF of
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> store is 76.6 M
>
>
> See there are less than 100 MB of data for 3 VMs. It's nothing.
>
>
>
> 2015-01-12 6:38 GMT+03:00 Ted Yu <yuzhihong@gmail.com>:
>
> > Serega:
> > Was the snippet of log from NODE01 ? Looks like NODE01 may have been
> under
> > heavy load - considering the number of daemons running on that node.
> >
> > Please check GC log.
> >
> > Cheers
> >
> > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <linshuai2012@gmail.com>
> wrote:
> >
> > > From the log I see no log was produced during 08:25 to 09:15, why did
> > this
> > > happen?
> > >
> > > 08:25:06.274INFOorg.apache.
> > > hadoop.hbase.regionserver.wal.HLog
> > >
> > > moving old hlog file
> > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > whose highest sequenceid is 229359 to
> > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > >
> > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Regards,
> > > Shuai
> > >
> > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> > serega.sheypak@gmail.com>
> > > wrote:
> > >
> > > > Hi, I have PoC HBase cluster running on 3 VM
> > > > deployment schema is:
> > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server
> (ZK),
> > DN
> > > > NODE02 RegionServer, DN
> > > > NODE03 RegionServer, DN
> > > >
> > > > Suddenly ONLY HBase went offline, all services: HM RS
> > > > HDFS was working, no alerts were there
> > > > ZK server was working, no alerts there.
> > > > VMWare didn't publish any alerts.
> > > > Only restart of HBase service helped.
> > > >
> > > > We are using this:
> > > >
> > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > > hbase-0.94.15+113
> > > >
> > > > I made a deep dive into logs and found this stuff:
> > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > after a delay of 3026
> > > >
> > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > > >
> > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Added
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > > > entries=8, sequenceid=229362, filesize=2.7 K
> > > >
> > > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > in 80ms, sequenceid=229362, compaction requested=true
> > > >
> > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Starting compaction on CF in region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > >
> > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Starting compaction of 4 file(s) in CF of
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > into
> > > >
> > >
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > > > seqid=229362, totalSize=76.6 M
> > > >
> > > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> > > >
> > > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8bf8e92031834676b5d40b352120c5f2
> > > >
> > > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Renaming compacted file at
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > > > to
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> > > >
> > > > 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8bf8e92031834676b5d40b352120c5f2
> > > >
> > > > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Completed major compaction of 4 file(s) in CF of
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > > > store is 76.6 M
> > > >
> > > > 08:15:59.283INFO
> > > > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> > > >
> > > > completed compaction:
> > > >
> > > >
> > >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > > > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > > > time=8079772165508216; duration=4sec
> > > >
> > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > Roll
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > > > entries=3, filesize=7313.  for
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> > > >
> > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > moving old hlog file
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > whose highest sequenceid is 229359 to
> > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > >
> > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after
> a
> > > > delay of 14844
> > > >
> > > > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after
> a
> > > > delay of 21848
> > > >
> > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> > > >
> > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> > > >
> > > > 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > a148cdce49964965b7b6f4728881427d
> > > >
> > > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > a148cdce49964965b7b6f4728881427d
> > > >
> > > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Added
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > > > entries=16, sequenceid=229369, filesize=4.5 K
> > > >
> > > > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
> > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > > > 90ms, sequenceid=229369, compaction requested=false
> > > >
> > > > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Client session timed out, have not heard from server in 40003ms for
> > > > sessionid 0x1493f19431b0005, closing socket connection and attempting
> > > > reconnect
> > > >
> > > > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Client session timed out, have not heard from server in 40020ms for
> > > > sessionid 0x1493f19431b0000, closing socket connection and attempting
> > > > reconnect
> > > >
> > > > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Opening socket connection to server
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > > > authenticate using SASL (java.lang.SecurityException: Unable to
> locate
> > > > a login configuration)
> > > >
> > > > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Socket connection established to
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> > > >
> > > > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Opening socket connection to server
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > > > authenticate using SASL (java.lang.SecurityException: Unable to
> locate
> > > > a login configuration)
> > > >
> > > > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Socket connection established to
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> > > >
> > > > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > > > has expired, closing socket connection
> > > >
> > > > 09:17:20.582INFO
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > >
> > > > This client just lost it's session with ZooKeeper, will automatically
> > > > reconnect when needed.
> > > >
> > > > 09:17:20.582INFO
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > >
> > > > ZK session expired. This disconnect could have been caused by a
> > > > network partition or a long-running GC pause, either way it's
> > > > recommended that you verify your environment.
> > > >
> > > > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > EventThread shut down
> > > >
> > > > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > > > has expired, closing socket connection
> > > >
> > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > ABORTING region server
> > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > regionserver:60020-0x1493f19431b0000
> > > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > > aborting
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > >         at
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > >         at
> > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > >
> > > > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > RegionServer abort: loaded coprocessors are: []
> > > >
> > > > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > > > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > > > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > > > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > > > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > > > readRequestsCount=279141, writeRequestsCount=91793,
> > > > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > > > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > > > blockCacheCount=7, blockCacheHitCount=596347,
> > > > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > > > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > > > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > > > fsReadLatencyHistogramMean=330098.85,
> > > > fsReadLatencyHistogramCount=2970773.00,
> > > > fsReadLatencyHistogramMedian=312261.50,
> > > > fsReadLatencyHistogram75th=357708.50,
> > > > fsReadLatencyHistogram95th=3147993.00,
> > > > fsReadLatencyHistogram99th=4431904.00,
> > > > fsReadLatencyHistogram999th=4431904.00,
> > > > fsPreadLatencyHistogramMean=3245622.44,
> > > > fsPreadLatencyHistogramCount=36742.00,
> > > > fsPreadLatencyHistogramMedian=3180486.50,
> > > > fsPreadLatencyHistogram75th=3733068.50,
> > > > fsPreadLatencyHistogram95th=6424537.75,
> > > > fsPreadLatencyHistogram99th=24206578.00,
> > > > fsPreadLatencyHistogram999th=24206578.00,
> > > > fsWriteLatencyHistogramMean=356278.05,
> > > > fsWriteLatencyHistogramCount=2983888.00,
> > > > fsWriteLatencyHistogramMedian=538147.00,
> > > > fsWriteLatencyHistogram75th=570668.00,
> > > > fsWriteLatencyHistogram95th=570668.00,
> > > > fsWriteLatencyHistogram99th=570668.00,
> > > > fsWriteLatencyHistogram999th=570668.00
> > > >
> > > > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > STOPPED: regionserver:60020-0x1493f19431b0000
> > > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > > aborting
> > > >
> > > > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > EventThread shut down
> > > >
> > > > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping server on 60020
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 0 on 60020: exiting
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 1 on 60020: exiting
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 0 on 60020: exiting
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping IPC Server listener on 60020
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 3 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 8 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 4 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 3 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > > > HBaseServer
> > > >
> > > > PRI IPC Server handler 6 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 2 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 6 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 5 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 7 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 2 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > >
> > > > Sending interrupt to stop the worker thread
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Stopping infoServer
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 4 on 60020: exiting
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping IPC Server Responder
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 7 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 8 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 1 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 9 on 60020: exiting
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 5 on 60020: exiting
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping IPC Server Responder
> > > >
> > > > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 9 on 60020: exiting
> > > >
> > > > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > >
> > > > SplitLogWorker interrupted while waiting for task, exiting:
> > > > java.lang.InterruptedException
> > > >
> > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > REPL IPC Server handler 0 on 60020: exiting
> > > >
> > > > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > >
> > > > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > exiting
> > > >
> > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > REPL IPC Server handler 2 on 60020: exiting
> > > >
> > > > 09:17:20.768INFOorg.mortbay.log
> > > >
> > > > Stopped SelectChannelConnector@0.0.0.0:60030
> > > >
> > > > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > REPL IPC Server handler 1 on 60020: exiting
> > > >
> > > > 09:17:20.908INFO
> > > >
> > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> > > >
> > > > Stopping RegionServerSnapshotManager abruptly.
> > > >
> > > > 09:17:20.908INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> > > >
> > > > regionserver60020.compactionChecker exiting
> > > >
> > > > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> > > >
> > > > LogRoller exiting.
> > > >
> > > > 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> > > >
> > > > regionserver60020.cacheFlusher exiting
> > > >
> > > > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Waiting on 3 regions to close
> > > >
> > > > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Closed CF
> > > >
> > > > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> > > >
> > > > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Closed info
> > > >
> > > > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Closed .META.,,1.1028785192
> > > >
> > > > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Closed CF
> > > >
> > > > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Memstore size is 13376
> > > >
> > > > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Closed
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > >
> > > > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> all
> > > > regions closed.
> > > >
> > > > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > regionserver60020.logSyncer exiting
> > > >
> > > > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020 closing leases
> > > >
> > > > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020 closed leases
> > > >
> > > > 09:17:22.021INFO
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> > > >
> > > > regionserver60020.periodicFlusher exiting
> > > >
> > > >
> 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > >
> > > > Sleeping 2000ms before retry #1...
> > > >
> > > > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020.leaseChecker closing leases
> > > >
> > > > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020.leaseChecker closed leases
> > > >
> > > >
> 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > >
> > > > Sleeping 4000ms before retry #2...
> > > >
> > > >
> 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > >
> > > > Sleeping 8000ms before retry #3...
> > > >
> > > >
> 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > >
> 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > ZooKeeper delete failed after 3 retries
> > > >
> > > > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Failed deleting my ephemeral node
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >         at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > > >         at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > >         at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> > > >         at
> > > > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> > > >         at
> > > > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> > > >         at java.lang.Thread.run(Thread.java:662)
> > > >
> > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > > > zookeeper connection closed.
> > > >
> > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020 exiting
> > > >
> > > > 09:17:36.051ERROR
> > > > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> > > >
> > > > Region server exiting
> > > > java.lang.RuntimeException: HRegionServer Aborted
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> > > >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> > > >
> > > > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > >
> > > > Shutdown hook starting; hbase.shutdown.hook=true;
> > > >
> > > >
> > >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> > > >
> > > > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > STOPPED: Shutdown hook
> > > >
> > > > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > >
> > > > Starting fs shutdown hook thread.
> > > >
> > > > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > >
> > > > Shutdown hook finished.
> > > >
> > > > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > >
> > > > HBase 0.94.15-cdh4.7.0
> > > >
> > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > >
> > > > Subversion
> > > >
> > >
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > > > -r Unknown
> > > >
> > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > >
> > > > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> > > >
> > > > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > >
> > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > > > Inc., vmVersion=20.6-b01
> > > >
> > > > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > >
> > > > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > > > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > > > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > > > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > > > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> > > >
> > > >
> > >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> > > >
> > > >
> > >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > > > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> > > >
> > > >
> > >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > > > -Dhbase.security.logger=INFO,DRFAS]
> > > >
> > > > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Starting Thread-2
> > > >
> > > > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> > > >
> > > > Starting Thread-2
> > > >
> > > >
> > > > this FATAL
> > > >
> > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > ABORTING region server
> > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > regionserver:60020-0x1493f19431b0000
> > > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > > aborting
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > >         at
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > >         at
> > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > >
> > > > found on each RS
> > > > I didn' t see any problems with ZK or memory usage. The load is
> > extremely
> > > > low, just few put/get in a second, you can see file sizes in logs.
> > > > What it could be how can i catch the root cause?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message