hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steen Manniche <boxun...@gmail.com>
Subject Re: Re: HBase master dies (1.1.2) often
Date Wed, 29 Mar 2017 07:22:05 GMT
Have you tried checking the network connections between the hbase
servers and the zookeeper instance?

when the error happens, you could try checking the status of the
zookeeper instance from one of the hbase machines with e.g.

`echo ruok | netcat zookeeper-server-adress port-for-zookeeper`

br,
steen

On Wed, Mar 29, 2017 at 8:00 AM, zhou_shuaifeng@sina.com
<zhou_shuaifeng@sina.com> wrote:
> We encounter this problem before,
> To avoid it, try to reduce the heavy load of the machine where HMaster and zk nodes on.
> Especially, limit the resource used by MR and spark jobs.
>
>
>
> zhou_shuaifeng@sina.com
>
> From: Josh Elser
> Date: 2017-03-29 00:32
> To: user
> Subject: Re: HBase master dies (1.1.2) often
>
> Margus -- have you found/read
> https://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> I think you're probably chasing a red-herring in the ZK logs. Most
> likely, it's an issue that needs to be addressed in the HBase
> configuration/tuning side.
>
> If it's happening nightly, I'd guess that you might have some scheduled
> task (e.g. ETL, MapReduce job, some cron task) which is creating some
> abnormally high latency on the physical machine which the HBase Master
> is running on that is causing the Master to lose its session.
>
> Margus Roo wrote:
>> Hi
>>
>> Latest log before HBase master went down.
>>
>> *Hbase-master log:*
>>
>> 2017-03-27 14:01:51,679 FATAL [main-EventThread] master.HMaster: Master
>> server abort: loaded coprocessors are:
>> [org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor,
>> org.apache.hadoop.hbase.backup.master.BackupController,
>> org.apache.hadoop.hbase.security.visibility.VisibilityController]
>> 2017-03-27 14:01:51,926 FATAL [main-EventThread] master.HMaster:
>> master:16000-0x25b02c498660ce1,
>> quorum=bigdata33.webmedia.int:2181,bigdata36.webmedia.int:2181,nn3.webmedia.int:2181,
>> baseZNode=/hbase-unsecure master:16000-0x25b02c498660ce1 received
>> expired from ZooKeeper, aborting
>>
>>
>>
>> 2017-03-27 14:01:51,964 WARN
>> [nn3.webmedia.int,16000,1490595106359_ChoreService_1]
>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>> quorum=bigdata33.webmedia.int:2181,bigdata36.webmedia.int:2181,nn3.webmedia.int:2181,
>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs
>>
>>
>> essionExpiredException: KeeperErrorCode = Session expired for
>> /hbase-unsecure/master
>> 2017-03-27 14:02:00,687 WARN
>> [nn3.webmedia.int,16000,1490595106359_ChoreService_1] zookeeper.ZKUtil:
>> replicationLogCleaner-0x35b0cdb80240204,
>> quorum=bigdata33.webmedia.int:2181,bigdata36.webmedia.int:2181,nn3.webmedia.int:2181,
>> baseZNode=/hbase-unsecure Unable to get data of znode
>> /hbase-unsecure/replication/rs
>> java.lang.InterruptedException: sleep interrupted
>>
>> 2017-03-27 14:04:07,581 ERROR
>> [master/nn3.webmedia.int/192.168.80.51:16000]
>> zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 7 attempts
>> 2017-03-27 14:04:07,581 WARN
>> [master/nn3.webmedia.int/192.168.80.51:16000] zookeeper.ZKUtil:
>> master:16000-0x25b02c498660ce1,
>> quorum=bigdata33.webmedia.int:2181,bigdata36.webmedia.int:2181,nn3.webmedia.int:2181,
>> baseZNode=/hbase-unsecure Unable to get data of znode
>> /hbase-unsecure/master
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>
>>
>> *Same time in zookeeper log:*
>>
>> 2017-03-27 14:00:39,902 - WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
>> end of stream exception
>> EndOfStreamException: Unable to read additional data from client
>> sessionid 0x35b0cdb80240000, likely client has closed socket
>> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>> at
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>>
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-03-27 14:00:55,197 - INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>> socket connection for client /192.168.80.51:60412 which had sessionid
>> 0x35b0cdb80240000
>> 2017-03-27 14:00:45,494 - WARN [SyncThread:3:SendAckRequestProcessor@64]
>> - Closing connection to leader, exception during packet send
>> java.net.SocketException: Broken pipe
>> at java.net.SocketOutputStream.socketWrite0(Native Method)
>> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>> at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>> at
>> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62)
>>
>> at
>> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:204)
>>
>> at
>> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
>>
>> 2017-03-27 14:00:41,568 - WARN
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>> following the leader
>> java.net.SocketException: Broken pipe
>> at java.net.SocketOutputStream.socketWrite0(Native Method)
>> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>> at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>> at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:532)
>> at
>> org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:112)
>>
>> at
>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:86)
>> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:819)
>> 2017-03-27 14:00:55,399 - WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught
>> end of stream exception
>> EndOfStreamException: Unable to read additional data from client
>> sessionid 0x35b0cdb802403ce, likely client has closed socket
>> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>> at
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>>
>> 2017-03-27 14:01:06,012 - INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>> socket connection for client /127.0.0.1:60074 which had sessionid
>> 0x35b0cdb802403ce
>> 2017-03-27 14:01:06,906 - INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>> Accepted socket connection from /192.168.80.51:35572
>> 2017-03-27 14:01:06,907 - INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client
>> attempting to renew session 0x25b02c498660ae3 at /192.168.80.51:35572
>> 2017-03-27 14:01:13,258 - INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
>> client: 0x25b02c498660ae3
>> 2017-03-27 14:01:13,963 - WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] -
>> Exception causing close of session 0x25b02c498660ae3 due to
>> java.net.SocketException: Socket closed
>> 2017-03-27 14:01:13,963 - INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>> socket connection for client /192.168.80.51:35572 which had sessionid
>> 0x25b02c498660ae3
>> 2017-03-27 14:01:14,323 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>> java.lang.Exception: shutdown Follower
>> at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:823)
>> 2017-03-27 14:01:14,409 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1007] - Closed
>> socket connection for client /192.168.80.51:52844 which had sessionid
>> 0x35b0cdb80240204
>> 2017-03-27 14:01:14,409 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
>> Shutting down
>> 2017-03-27 14:01:14,409 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@441] - shutting
>> down
>> 2017-03-27 14:01:14,410 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] -
>> Shutting down
>> 2017-03-27 14:01:14,410 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting
>> down
>> 2017-03-27 14:01:14,442 - INFO
>> [FollowerRequestProcessor:3:FollowerRequestProcessor@95] -
>> FollowerRequestProcessor exited loop!
>> 2017-03-27 14:01:14,451 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] -
>> shutdown of request processor complete
>> 2017-03-27 14:01:14,516 - INFO [CommitProcessor:3:CommitProcessor@150] -
>> CommitProcessor exited loop!
>> 2017-03-27 14:01:21,001 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@209] -
>> Shutting down
>> 2017-03-27 14:01:21,001 - INFO [SyncThread:3:SyncRequestProcessor@187] -
>> SyncRequestProcessor exited!
>> 2017-03-27 14:01:21,002 - INFO
>> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@747] - LOOKING at
>> java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> Margus (margusja) Roo
>> http://margus.roo.ee
>> skype: margusja
>> https://www.facebook.com/allan.tuuring
>> +372 51 48 780
>>
>> On 23/03/2017 08:43, Ted Yu wrote:
>>> Have you checked zookeeper logs to see if there was some clue ?
>>>
>>> Cheers
>>>
>>>> On Mar 22, 2017, at 11:30 PM, Margus Roo <margus@roo.ee> wrote:
>>>>
>>>> Hi
>>>>
>>>> Almost every night hbase master is closed. In error log I can see:
>>>> gc.log:
>>>> 2017-03-23T01:59:27.239+0200: 41752.366: [GC (Allocation Failure)
>>>> 2017-03-23T01:59:27.239+0200: 41752.366: [ParNew:
>>>> 159203K->11611K(166464K), 0.0115189 secs] 177260K->29669K(536512K),
>>>> 0.0117362 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
>>>> Heap
>>>> par new generation total 166464K, used 137930K [0x00000000c0000000,
>>>> 0x00000000cb4a0000, 0x00000000d5550000)
>>>> eden space 147968K, 85% used [0x00000000c0000000, 0x00000000c7b5b8b8,
>>>> 0x00000000c9080000)
>>>> from space 18496K, 62% used [0x00000000ca290000, 0x00000000cade6fa8,
>>>> 0x00000000cb4a0000)
>>>> to space 18496K, 0% used [0x00000000c9080000, 0x00000000c9080000,
>>>> 0x00000000ca290000)
>>>> concurrent mark-sweep generation total 370048K, used 18057K
>>>> [0x00000000d5550000, 0x00000000ebeb0000, 0x0000000100000000)
>>>> Metaspace used 55061K, capacity 56096K, committed 56400K, reserved
>>>> 1099776K
>>>> class space used 5899K, capacity 6255K, committed 6264K, reserved
>>>> 1048576K
>>>>
>>>>
>>>>
>>>>
>>>> In master.log
>>>> 2017-03-23 02:02:09,178 WARN
>>>> [master/nn3/192.168.80.51:16000-EventThread]
>>>> client.ConnectionManager$HConnectionImplementation: This client just
>>>> lost it's session with ZooKeeper, closing it. It will be recreated
>>>> next time someone needs it
>>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
>>>>
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
>>>>
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
>>>>
>>>> at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
>>>>
>>>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>>>> 2017-03-23 02:02:10,579 FATAL [main-EventThread] master.HMaster:
>>>> Master server abort: loaded coprocessors are:
>>>> [org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor, org.apache.hadoop.hbase.backup.master.BackupController,
>>>> org.apache.hadoop.hbase.security.visibility.VisibilityController]
>>>> 2017-03-23 02:02:10,857 FATAL [main-EventThread] master.HMaster:
>>>> master:16000-0x15adbb9b9db078a,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> baseZNode=/hbase-unsecure master:16000-0x15adbb9b9db078a received
>>>> expired from ZooKeeper, aborting
>>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
>>>>
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
>>>>
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
>>>>
>>>> at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
>>>>
>>>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>>>> 2017-03-23 02:02:10,090 INFO [main-SendThread(nn3:2181)]
>>>> zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service,
>>>> session 0x15adbb9b9db078a has expired, closing socket connection
>>>> 2017-03-23 02:02:09,181 WARN
>>>> [nn3:16000.activeMasterManager-EventThread]
>>>> client.ConnectionManager$HConnectionImplementation: This client just
>>>> lost it's session with ZooKeeper, closing it. It will be recreated
>>>> next time someone needs it
>>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
>>>>
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
>>>>
>>>> at
>>>> org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
>>>>
>>>> at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
>>>>
>>>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>>>> 2017-03-23 02:02:10,894 INFO
>>>> [nn3:16000.activeMasterManager-EventThread]
>>>> client.ConnectionManager$HConnectionImplementation: Closing zookeeper
>>>> sessionid=0x25adbb9ba62075d
>>>> 2017-03-23 02:02:10,894 INFO
>>>> [nn3:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2017-03-23 02:02:10,876 INFO
>>>> [master/nn3/192.168.80.51:16000-EventThread]
>>>> client.ConnectionManager$HConnectionImplementation: Closing zookeeper
>>>> sessionid=0x25adbb9ba62075c
>>>> 2017-03-23 02:02:10,897 INFO
>>>> [master/nn3/192.168.80.51:16000-EventThread] zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2017-03-23 02:02:10,925 INFO [main-EventThread]
>>>> regionserver.HRegionServer: STOPPED: master:16000-0x15adbb9b9db078a,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> baseZNode=/hbase-unsecure master:16000-0x15adbb9b9db078a received
>>>> expired from ZooKeeper, aborting
>>>> 2017-03-23 02:02:10,935 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2017-03-23 02:02:11,005 INFO [master/nn3/192.168.80.51:16000]
>>>> regionserver.HRegionServer: Stopping infoServer
>>>> 2017-03-23 02:02:11,624 INFO
>>>> [nn3,16000,1490185417271_splitLogManager__ChoreService_1]
>>>> master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout
>>>> Monitor was stopped
>>>> 2017-03-23 02:02:11,628 WARN [nn3,16000,1490185417271_ChoreService_1]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/backup-masters
>>>> 2017-03-23 02:02:12,104 INFO [master/nn3/192.168.80.51:16000]
>>>> mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
>>>> 2017-03-23 02:02:11,628 WARN [nn3,16000,1490185417271_ChoreService_1]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/backup-masters
>>>> 2017-03-23 02:02:12,104 INFO [master/nn3/192.168.80.51:16000]
>>>> mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
>>>> 2017-03-23 02:02:12,286 INFO [master/nn3/192.168.80.51:16000]
>>>> procedure2.ProcedureExecutor: Stopping the procedure executor
>>>> 2017-03-23 02:02:12,336 INFO [master/nn3/192.168.80.51:16000]
>>>> wal.WALProcedureStore: Stopping the WAL Procedure Store
>>>> 2017-03-23 02:02:13,044 WARN [nn3,16000,1490185417271_ChoreService_1]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/backup-masters
>>>> 2017-03-23 02:02:14,497 INFO [master/nn3/192.168.80.51:16000]
>>>> regionserver.HRegionServer: stopping server nn3,16000,1490185417271
>>>> 2017-03-23 02:02:14,514 INFO [master/nn3/192.168.80.51:16000]
>>>> regionserver.HRegionServer: stopping server nn3,16000,1490185417271;
>>>> all regions closed.
>>>> 2017-03-23 02:02:14,532 INFO [master/nn3/192.168.80.51:16000]
>>>> hbase.ChoreService: Chore service for: nn3,16000,1490185417271 had
>>>> [[ScheduledChore: Name: CatalogJanitor-nn3:16000 Period: 300000 Unit:
>>>> MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit:
>>>> MILLISECONDS], [ScheduledChore: Name:
>>>> nn3,16000,1490185417271-ExpiredMobFileCleanerChore Period: 86400
>>>> Unit: SECONDS], [ScheduledChore: Name:
>>>> nn3,16000,1490185417271-MobCompactionChore Period: 604800 Unit:
>>>> SECONDS], [ScheduledChore: Name:
>>>> nn3,16000,1490185417271-ClusterStatusChore Period: 60000 Unit:
>>>> MILLISECONDS], [ScheduledChore: Name:
>>>> nn3,16000,1490185417271-BalancerChore Period: 300000 Unit:
>>>> MILLISECONDS], [ScheduledChore: Name: HFileCleaner Period: 60000
>>>> Unit: MILLISECONDS], [ScheduledChore: Name:
>>>> nn3,16000,1490185417271-RegionNormalizerChore Period: 1800000 Unit:
>>>> MILLISECONDS]] on shutdown
>>>> 2017-03-23 02:02:14,630 INFO [master/nn3/192.168.80.51:16000]
>>>> master.MasterMobCompactionThread: Waiting for Mob Compaction Thread
>>>> to finish...
>>>> 2017-03-23 02:02:14,644 INFO [master/nn3/192.168.80.51:16000]
>>>> master.MasterMobCompactionThread: Waiting for Region Server Mob
>>>> Compaction Thread to finish...
>>>> 2017-03-23 02:02:14,671 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:02:15,684 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:02:17,684 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:02:21,685 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:02:29,685 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:02:45,686 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:03:17,686 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:04:21,686 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> 2017-03-23 02:04:21,687 ERROR [master/nn3/192.168.80.51:16000]
>>>> zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 7
>>>> attempts
>>>> 2017-03-23 02:04:21,687 WARN [master/nn3/192.168.80.51:16000]
>>>> zookeeper.ZKUtil: master:16000-0x15adbb9b9db078a,
>>>> quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
>>>> baseZNode=/hbase-unsecure Unable to get data of znode
>>>> /hbase-unsecure/master
>>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /hbase-unsecure/master
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> hbase-site.xml:
>>>> <configuration>
>>>>
>>>> <property>
>>>> <name>dfs.client.read.shortcircuit</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>dfs.domain.socket.path</name>
>>>> <value>/var/lib/hadoop-hdfs/dn_socket</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.bulkload.staging.dir</name>
>>>> <value>/apps/hbase/staging</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.client.keyvalue.maxsize</name>
>>>> <value>1048576</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.client.retries.number</name>
>>>> <value>35</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.client.scanner.caching</name>
>>>> <value>100</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.client.scanner.timeout.period</name>
>>>> <value>600000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.cluster.distributed</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.coprocessor.master.classes</name>
>>>> <value>org.apache.hadoop.hbase.security.visibility.VisibilityController,org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.coprocessor.region.classes</name>
>>>> <value>org.apache.hadoop.hbase.security.visibility.VisibilityController,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint,org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.coprocessor.regionserver.classes</name>
>>>> <value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
>>>>
>>>> </property>
>>>> <property>
>>>> <name>hbase.hregion.majorcompaction</name>
>>>> <value>604800000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hregion.majorcompaction.jitter</name>
>>>> <value>0.50</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hregion.max.filesize</name>
>>>> <value>10737418240</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hregion.memstore.block.multiplier</name>
>>>> <value>4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hregion.memstore.flush.size</name>
>>>> <value>134217728</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hregion.memstore.mslab.enabled</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hstore.blockingStoreFiles</name>
>>>> <value>10</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hstore.compaction.max</name>
>>>> <value>10</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.hstore.compactionThreshold</name>
>>>> <value>3</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.local.dir</name>
>>>> <value>${hbase.tmp.dir}/local</value>
>>>> </property>
>>>> <property>
>>>> <name>hbase.master.info.bindAddress</name>
>>>> <value>0.0.0.0</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.master.info.port</name>
>>>> <value>16010</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.master.loadbalance.bytable</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.master.port</name>
>>>> <value>16000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.master.ui.readonly</name>
>>>> <value>false</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.regionserver.global.memstore.size</name>
>>>> <value>0.4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.regionserver.handler.count</name>
>>>> <value>30</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.regionserver.info.port</name>
>>>> <value>16030</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.regionserver.port</name>
>>>> <value>16020</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.regionserver.wal.codec</name>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.WALCellCodec</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.rootdir</name>
>>>> <value>hdfs://nn3:8020/apps/hbase/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.rpc.protection</name>
>>>> <value>authentication</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.rpc.timeout</name>
>>>> <value>90000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.security.authentication</name>
>>>> <value>simple</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.security.authorization</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.superuser</name>
>>>> <value>hbase</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.tmp.dir</name>
>>>> <value>/tmp/hbase-${user.name}</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.zookeeper.property.clientPort</name>
>>>> <value>2181</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.zookeeper.quorum</name>
>>>> <value>bigdata33,bigdata36,nn3</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hbase.zookeeper.useMulti</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hfile.block.cache.size</name>
>>>> <value>0.4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>hfile.format.version</name>
>>>> <value>3</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>phoenix.query.timeoutMs</name>
>>>> <value>60000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>replication.executor.workers</name>
>>>> <value>2</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>replication.sleep.before.failover</name>
>>>> <value>60000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.recovery.retry</name>
>>>> <value>6</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.session.timeout</name>
>>>> <value>90000</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.znode.parent</name>
>>>> <value>/hbase-unsecure</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.znode.replication</name>
>>>> <value>replication</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.znode.replication.peers</name>
>>>> <value>peers</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.znode.replication.peers.state</name>
>>>> <value>peer-state</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>zookeeper.znode.replication.rs</name>
>>>> <value>rs</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>> Any hints?
>>>>
>>>> --
>>>> Margus (margusja) Roo
>>>> http://margus.roo.ee
>>>> skype: margusja
>>>> https://www.facebook.com/allan.tuuring
>>>> +372 51 48 780
>>>>
>>
>>
>
>

Mime
View raw message