hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samir Ahmic <ahmic.sa...@gmail.com>
Subject Re: 答复: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
Date Tue, 16 Apr 2013 05:23:00 GMT
Hi, Azuryy

This actions may resolve  RIT issue:

1.  Try to restart master
2. If 1. dont resolve issue run 'hbase zkcli' and remove hbase znode with
'rmr /hbase' and then restart cluster



On Tue, Apr 16, 2013 at 5:47 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> I cannot find any useful information from pasted logs.
>
>
> On Tue, Apr 16, 2013 at 11:22 AM, dylan <dwld0425@gmail.com> wrote:
>
>> yes. I have just discovered.****
>>
>> ** **
>>
>> I find the Slave01 and Slave03  zookeeper.out under zookeeper_home/bin/**
>> **
>>
>> But Slave02(which reboot before) zookeeper_home under / directory after
>> reboot ****
>>
>> ** **
>>
>> *Slave02  zookeeper.out show:*
>>
>> WARN  [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting
>> SendWorker****
>>
>> 2013-04-15 16:38:31,987 [myid:2] - WARN
>> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while
>> waiting for message on queue****
>>
>> java.lang.InterruptedException****
>>
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>> ****
>>
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094)
>> ****
>>
>>         at
>> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)
>> ****
>>
>>         at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
>> ****
>>
>>         at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
>> ****
>>
>>         at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
>> ****
>>
>> [myid:2] - WARN  [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send
>> worker leaving thread****
>>
>> [myid:2] - INFO  [Slave02/192.168.75.243:3888
>> :QuorumCnxManager$Listener@493] - Received connection request /
>> 192.168.75.242:51136****
>>
>> [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] -
>> Notification: 1 (n.leader), 0x50000037d (n.zxid), 0x1 (n.round), LOOKING
>> (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)****
>>
>> [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] -
>> Notification: 1 (n.leader), 0x50000037d (n.zxid), 0x2 (n.round), LOOKING
>> (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)****
>>
>> ** **
>>
>> ** **
>>
>> *Slave01  zookeeper.out show:*
>>
>> [myid:1] - INFO  [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627]
>> - Got user-level KeeperException when processing
>> sessionid:0x13e0dc5a0890005 type:create cxid:0x1e zxid:0xb0000003c
>> txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired
>> Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired***
>> *
>>
>> 2013-04-16 10:58:26,415 [myid:1] - INFO  [ProcessThread(sid:1
>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>> when processing sessionid:0x13e0dc5a0890006 type:create cxid:0x7
>> zxid:0xb0000003d txntype:-1 reqpath:n/a Error
>> Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
>> /hbase/online-snapshot/acquired****
>>
>> 2013-04-16 10:58:26,431 [myid:1] - INFO  [ProcessThread(sid:1
>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>> when processing sessionid:0x13e0dc5a0890007 type:create cxid:0x7
>> zxid:0xb0000003e txntype:-1 reqpath:n/a Error
>> Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
>> /hbase/online-snapshot/acquired****
>>
>> 2013-04-16 10:58:26,489 [myid:1] - INFO  [ProcessThread(sid:1
>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>> when processing sessionid:0x23e0dc5a333000a type:create cxid:0x7
>> zxid:0xb0000003f txntype:-1 reqpath:n/a Error
>> Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
>> /hbase/online-snapshot/acquired****
>>
>> 2013-04-16 10:58:36,001 [myid:1] - INFO
>> [SessionTracker:ZooKeeperServer@325] - Expiring session
>> 0x33e0dc5b4de0003, timeout of 40000ms exceeded****
>>
>> 2013-04-16 10:58:36,001 [myid:1] - INFO  [ProcessThread(sid:1
>> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
>> sessionid: 0x33e0dc5b4de0003****
>>
>> 2013-04-16 11:03:44,000 [myid:1] - INFO
>> [SessionTracker:ZooKeeperServer@325] - Expiring session
>> 0x23e0dc5a333000b, timeout of 40000ms exceeded****
>>
>> 2013-04-16 11:03:44,001 [myid:1] - INFO  [ProcessThread(sid:1
>> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
>> sessionid: 0x23e0dc5a333000b****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *发件人:* Azuryy Yu [mailto:azuryyyu@gmail.com]
>> *发送时间:* 2013年4月16日 11:13
>> *收件人:* user@hadoop.apache.org
>> *主题:* Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long,
this
>> should eventually complete or the server will expire, send RPC again****
>>
>> ** **
>>
>> then, can you find zookeeper log  under zookeeper_home/zookeeper.out ?***
>> *
>>
>> ** **
>>
>> On Tue, Apr 16, 2013 at 11:04 AM, dylan <dwld0425@gmail.com> wrote:****
>>
>> I use  hbase shell ****
>>
>>  ****
>>
>> I always show :****
>>
>> ERROR: org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hbase.PleaseHoldException: Master is initializing****
>>
>>  ****
>>
>> *发件人:* Azuryy Yu [mailto:azuryyyu@gmail.com] ****
>>
>> *发送时间:* 2013年4月16日 10:59
>> *收件人:* user@hadoop.apache.org
>> *主题:* Re: 答复: 答复: 答复: Region has been CLOSING for too long, this
should
>> eventually complete or the server will expire, send RPC again****
>>
>>  ****
>>
>> did your hbase managed zookeeper? or did you set export
>> HBASE_MANAGES_ZK=false in the hbase-env.sh?****
>>
>>  ****
>>
>> if not, then that's zookeeper port conflicted.****
>>
>>  ****
>>
>> On Tue, Apr 16, 2013 at 10:55 AM, dylan <dwld0425@gmail.com> wrote:****
>>
>> # The number of milliseconds of each tick****
>>
>> tickTime=2000****
>>
>> # The number of ticks that the initial ****
>>
>> # synchronization phase can take****
>>
>> initLimit=10****
>>
>> # The number of ticks that can pass between ****
>>
>> # sending a request and getting an acknowledgement****
>>
>> syncLimit=5****
>>
>> # the directory where the snapshot is stored.****
>>
>> # do not use /tmp for storage, /tmp here is just ****
>>
>> # example sakes.****
>>
>> dataDir=/usr/cdh4/zookeeper/data****
>>
>> # the port at which the clients will connect****
>>
>> clientPort=2181****
>>
>>  ****
>>
>> server.1=Slave01:2888:3888****
>>
>> server.2=Slave02:2888:3888****
>>
>> server.3=Slave03:2888:3888****
>>
>>  ****
>>
>> *发件人:* Azuryy Yu [mailto:azuryyyu@gmail.com] ****
>>
>> *发送时间:* 2013年4月16日 10:45
>> *收件人:* user@hadoop.apache.org
>> *主题:* Re: 答复: 答复: Region has been CLOSING for too long, this should
>> eventually complete or the server will expire, send RPC again****
>>
>>  ****
>>
>> and paste ZK configuration in the zookeerp_home/conf/zoo.cfg****
>>
>>  ****
>>
>> On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:**
>> **
>>
>> it located under hbase-home/logs/  if your zookeeper is managed by hbase.
>> ****
>>
>>  ****
>>
>> but I noticed you configured QJM, then did your QJM and Hbase share the
>> same ZK cluster? if so, then just paste your QJM zk configuration in the
>> hdfs-site.xml and hbase zk configuration in the hbase-site.xml.****
>>
>>  ****
>>
>> On Tue, Apr 16, 2013 at 10:37 AM, dylan <dwld0425@gmail.com> wrote:****
>> How to check zookeeper log?? It is the binary files, how to transform it
>> to normal log? ****  ****I find the “
>> org.apache.zookeeper.server.LogFormatter”, how to run?****  ****
>>
>>  ****
>>
>> *发件人:* Azuryy Yu [mailto:azuryyyu@gmail.com]
>> *发送时间:* 2013年4月16日 10:01
>> *收件人:* user@hadoop.apache.org
>> *主题:* Re: 答复: Region has been CLOSING for too long, this should
>> eventually complete or the server will expire, send RPC again****
>>
>>  ****
>>
>> This is zookeeper issue.****
>>
>>  ****
>>
>> please paste zookeeper log here. thanks.****
>>
>>  ****
>>
>> On Tue, Apr 16, 2013 at 9:58 AM, dylan <dwld0425@gmail.com> wrote:****
>>
>> It is hbase-0.94.2-cdh4.2.0.****
>>
>>  ****
>>
>> *发件人:* Ted Yu [mailto:yuzhihong@gmail.com]
>> *发送时间:* 2013年4月16日 9:55
>> *收件人:* user@hbase.apache.org
>> *主题:* Re: Region has been CLOSING for too long, this should eventually
>> complete or the server will expire, send RPC again****
>>
>>  ****
>>
>> I think this question would be more appropriate for HBase user mailing
>> list.****
>>
>>  ****
>>
>> Moving hadoop user to bcc.****
>>
>>  ****
>>
>> Please tell us the HBase version you are using.****
>>
>>  ****
>>
>> Thanks****
>>
>> On Mon, Apr 15, 2013 at 6:51 PM, dylan <dwld0425@gmail.com> wrote:****
>>
>> Hi****
>>
>>  ****
>>
>> I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes
>> for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node.
>>  ), 3 DN nodes with zookeepers,  It works fine.  When I reboot one data
>> node machine which includes zookeeper, after that , restart all processes.
>> The hadoop works fine, but hbase not. I cannot disable tables and drop
>> tables.****
>>
>>  ****
>>
>> The logs an follows:****
>>
>> The Hbase HMaster log:****
>>
>> DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
>> unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
>> ****
>>
>> ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
>> transition timed out:  -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865,
>> server=Master,60000,1366001238313****
>>
>> ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
>> been CLOSING for too long, this should eventually complete or the server
>> will expire, send RPC again****
>>
>> 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
>> unassignment of region -ROOT-,,0.70236052 (offlining)****
>>
>>  ****
>>
>> The Hbase HRegionServer log:****
>>
>>  ****
>>
>> DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44
>> MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0,
>> hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,
>> evictions=0, evicted=0, evictedPerRun=NaN****
>>
>>  ****
>>
>> The Hbase Web show:****
>>
>> Region                                              State****
>>
>> 70236052    -ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST
>> 2013 (75440s ago), server=Master,60000,1366001238313****
>>
>>  ****
>>
>> How fix it?****
>>
>>  ****
>>
>> Thanks.****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> ** **
>>
>
>

Mime
View raw message