hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dylan" <dwld0...@gmail.com>
Subject 答复: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again
Date Tue, 16 Apr 2013 03:22:24 GMT
yes. I have just discovered.

 

I find the Slave01 and Slave03  zookeeper.out under zookeeper_home/bin/

But Slave02(which reboot before) zookeeper_home under / directory after
reboot 

 

Slave02  zookeeper.out show:

WARN  [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting
SendWorker

2013-04-15 16:38:31,987 [myid:2] - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
for message on queue

java.lang.InterruptedException

        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:2017)

        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:2094)

        at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)

        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxM
anager.java:831)

        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxMana
ger.java:62)

        at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:667)

[myid:2] - WARN  [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send
worker leaving thread

[myid:2] - INFO  [Slave02/192.168.75.243:3888:QuorumCnxManager$Listener@493]
- Received connection request /192.168.75.242:51136

[myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] -
Notification: 1 (n.leader), 0x50000037d (n.zxid), 0x1 (n.round), LOOKING (n.
state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)

[myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] -
Notification: 1 (n.leader), 0x50000037d (n.zxid), 0x2 (n.round), LOOKING (n.
state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)

 

 

Slave01  zookeeper.out show:

[myid:1] - INFO  [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] -
Got user-level KeeperException when processing sessionid:0x13e0dc5a0890005
type:create cxid:0x1e zxid:0xb0000003c txntype:-1 reqpath:n/a Error
Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
/hbase/online-snapshot/acquired

2013-04-16 10:58:26,415 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when
processing sessionid:0x13e0dc5a0890006 type:create cxid:0x7 zxid:0xb0000003d
txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired
Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired

2013-04-16 10:58:26,431 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when
processing sessionid:0x13e0dc5a0890007 type:create cxid:0x7 zxid:0xb0000003e
txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired
Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired

2013-04-16 10:58:26,489 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when
processing sessionid:0x23e0dc5a333000a type:create cxid:0x7 zxid:0xb0000003f
txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired
Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired

2013-04-16 10:58:36,001 [myid:1] - INFO
[SessionTracker:ZooKeeperServer@325] - Expiring session 0x33e0dc5b4de0003,
timeout of 40000ms exceeded

2013-04-16 10:58:36,001 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x33e0dc5b4de0003

2013-04-16 11:03:44,000 [myid:1] - INFO
[SessionTracker:ZooKeeperServer@325] - Expiring session 0x23e0dc5a333000b,
timeout of 40000ms exceeded

2013-04-16 11:03:44,001 [myid:1] - INFO  [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x23e0dc5a333000b

 

 

 

发件人: Azuryy Yu [mailto:azuryyyu@gmail.com] 
发送时间: 2013年4月16日 11:13
收件人: user@hadoop.apache.org
主题: Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this
should eventually complete or the server will expire, send RPC again

 

then, can you find zookeeper log  under zookeeper_home/zookeeper.out ?

 

On Tue, Apr 16, 2013 at 11:04 AM, dylan <dwld0425@gmail.com> wrote:

I use  hbase shell 

 

I always show :

ERROR: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

 

发件人: Azuryy Yu [mailto:azuryyyu@gmail.com] 

发送时间: 2013年4月16日 10:59
收件人: user@hadoop.apache.org
主题: Re: 答复: 答复: 答复: Region has been CLOSING for too long, this
should eventually complete or the server will expire, send RPC again

 

did your hbase managed zookeeper? or did you set export
HBASE_MANAGES_ZK=false in the hbase-env.sh?

 

if not, then that's zookeeper port conflicted.

 

On Tue, Apr 16, 2013 at 10:55 AM, dylan <dwld0425@gmail.com> wrote:

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial 

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between 

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just 

# example sakes.

dataDir=/usr/cdh4/zookeeper/data

# the port at which the clients will connect

clientPort=2181

 

server.1=Slave01:2888:3888

server.2=Slave02:2888:3888

server.3=Slave03:2888:3888

 

发件人: Azuryy Yu [mailto:azuryyyu@gmail.com] 

发送时间: 2013年4月16日 10:45
收件人: user@hadoop.apache.org
主题: Re: 答复: 答复: Region has been CLOSING for too long, this should
eventually complete or the server will expire, send RPC again

 

and paste ZK configuration in the zookeerp_home/conf/zoo.cfg

 

On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:

it located under hbase-home/logs/  if your zookeeper is managed by hbase.

 

but I noticed you configured QJM, then did your QJM and Hbase share the same
ZK cluster? if so, then just paste your QJM zk configuration in the
hdfs-site.xml and hbase zk configuration in the hbase-site.xml.

 

On Tue, Apr 16, 2013 at 10:37 AM, dylan <dwld0425@gmail.com> wrote:


How to check zookeeper log?? It is the binary files, how to transform it to
normal log? 


 


I find the “org.apache.zookeeper.server.LogFormatter”, how to run?


 


 

发件人: Azuryy Yu [mailto:azuryyyu@gmail.com] 
发送时间: 2013年4月16日 10:01
收件人: user@hadoop.apache.org
主题: Re: 答复: Region has been CLOSING for too long, this should eventually
complete or the server will expire, send RPC again

 

This is zookeeper issue.

 

please paste zookeeper log here. thanks.

 

On Tue, Apr 16, 2013 at 9:58 AM, dylan <dwld0425@gmail.com> wrote:

It is hbase-0.94.2-cdh4.2.0.

 

发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2013年4月16日 9:55
收件人: user@hbase.apache.org
主题: Re: Region has been CLOSING for too long, this should eventually
complete or the server will expire, send RPC again

 

I think this question would be more appropriate for HBase user mailing list.

 

Moving hadoop user to bcc.

 

Please tell us the HBase version you are using.

 

Thanks

On Mon, Apr 15, 2013 at 6:51 PM, dylan <dwld0425@gmail.com> wrote:

Hi

 

I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes for
cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node.  ), 3
DN nodes with zookeepers,  It works fine.  When I reboot one data node
machine which includes zookeeper, after that , restart all processes. The
hadoop works fine, but hbase not. I cannot disable tables and drop tables.

 

The logs an follows:

The Hbase HMaster log:

DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere

,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
transition timed out:  -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865,
server=Master,60000,1366001238313

,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been
CLOSING for too long, this should eventually complete or the server will
expire, send RPC again

10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region -ROOT-,,0.70236052 (offlining)

 

The Hbase HRegionServer log:

 

DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44 MB,
free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0, hitRatio=0,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0,
evicted=0, evictedPerRun=NaN

 

The Hbase Web show:

Region                                              State

70236052    -ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST
2013 (75440s ago), server=Master,60000,1366001238313

 

How fix it?

 

Thanks.

 

 

 

 

 

 


Mime
View raw message