Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 32117 invoked from network); 12 Nov 2009 19:43:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Nov 2009 19:43:24 -0000 Received: (qmail 43953 invoked by uid 500); 12 Nov 2009 19:43:23 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 43919 invoked by uid 500); 12 Nov 2009 19:43:23 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 43909 invoked by uid 99); 12 Nov 2009 19:43:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2009 19:43:23 +0000 X-ASF-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,NORMAL_HTTP_TO_IP,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 209.85.221.188 as permitted sender) Received: from [209.85.221.188] (HELO mail-qy0-f188.google.com) (209.85.221.188) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2009 19:43:18 +0000 Received: by qyk26 with SMTP id 26so1182076qyk.5 for ; Thu, 12 Nov 2009 11:42:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=A8z2V/MJfl6tyftHQRFNu16BgVZKO01o0Q5wR6+x6tM=; b=hNVMW6Ye7NV7wx14Hpc1oVx2DG8aJAthD9YCzKn0LO1p58TyaNmL1An32Tz3iKhZ6o dHpRsiQ4oiWr23+rsCJS9UUxYTKpBmkBrwmz4ePgGQfuJ+sJN+GE7nGxWDmy863KkLtY O/QzuRpjh8ehT3or2kWzwegsFdSFJ8yRbeg4o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=uPTWswwfUTWdRvx2G3SDaD0/gKrwNCDN5ott/QygZ2/E6VukrNoMD8+buxX/ySeMK6 4XK4S/Atm71BkerUTVromW5NqzH8MSsWfh+ACLzq2HETFEZoG1Yu59Ne2UQLYlQ+dfrB POv27IBJPmi7rRhxiCmW/0ibyYmMeiStwf1JM= MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.229.119.69 with SMTP id y5mr471517qcq.100.1258054976929; Thu, 12 Nov 2009 11:42:56 -0800 (PST) In-Reply-To: References: <7c962aed0911111252ofa7b803ncf7915da0bb7fce6@mail.gmail.com> <31a243e70911121010v541697f0m168f5c3dea1dc2ab@mail.gmail.com> <31a243e70911121042v555dd4d3n461c64ca553ee9f2@mail.gmail.com> <31a243e70911121114u47c19404sfdd4921803ad1d0@mail.gmail.com> Date: Thu, 12 Nov 2009 11:42:56 -0800 X-Google-Sender-Auth: 955e75f320ab909d Message-ID: <7c962aed0911121142m4d0119b3vbc4121e3f5a13ca1@mail.gmail.com> Subject: Re: regionserver disconnection From: stack To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd5ce4402003e047831c0ed --000e0cd5ce4402003e047831c0ed Content-Type: text/plain; charset=ISO-8859-1 superpyxis0002 is the name of the RS whose log you posted earlier? Thanks, St.Ack On Thu, Nov 12, 2009 at 11:36 AM, Zhenyu Zhong wrote: > J-D, > > Please take a look at the master log during that time. > > http://pastebin.com/m502a3581 > > > We can see that during that time, one regionserver disconnected. > > > Many thanks! > zhenyu > > > On Thu, Nov 12, 2009 at 2:14 PM, Jean-Daniel Cryans >wrote: > > > Could we see the master log from around the same time the region > > server got a session expired? > > > > Thx, > > > > J-D > > > > On Thu, Nov 12, 2009 at 10:54 AM, Zhenyu Zhong > > wrote: > > > Sorry, there was a typo in my email. But it is 10 minutes that I set in > > the > > > hbase-site.xml. > > > > > > Please see the link. > > > http://pastebin.com/m564c51bc > > > > > > Very much appreciated. > > > zhenyu > > > > > > On Thu, Nov 12, 2009 at 1:42 PM, Jean-Daniel Cryans < > jdcryans@apache.org > > >wrote: > > > > > >> Usually if it was 10 minutes we would see sleepers complaining they > > >> slept for way too long. In the logs I can see a 1 minute pause but not > > >> 10 minutes because of the ZK debug. Are you sure you don't have a > > >> typo? Like what you just wrote is not valid, you wrote > > >> zookeeper.session.timout instead of zookeeper.session.timeout. Are you > > >> sure it's distributed over the cluster? Could you pastebin > > >> hbase-site.xml? > > >> > > >> Thx, > > >> > > >> J-D > > >> > > >> On Thu, Nov 12, 2009 at 10:28 AM, Zhenyu Zhong < > zhongresearch@gmail.com > > > > > >> wrote: > > >> > J-D, > > >> > > > >> > Yes, that is weird. But it is what I have in the log. Probably the > log > > is > > >> > flushed by different threads, and one thread is blocked for a while? > > >> > > > >> > I set the zookeeper.session.timout=600000, which is 10 minutes. > > However, > > >> we > > >> > still see this problem. > > >> > > > >> > Does GC pause for such a long time period? > > >> > I will try different GC options to see the differences. > > >> > > > >> > thanks > > >> > zhenyu > > >> > > > >> > > > >> > On Thu, Nov 12, 2009 at 1:10 PM, Jean-Daniel Cryans < > > jdcryans@apache.org > > >> >wrote: > > >> > > > >> >> I see something weird > > >> >> > > >> >> 2009-11-12 15:04:36,370 WARN > > >> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 > > >> >> org.apache.hadoop.hbase.Leases$LeaseStillHeldException > > >> >> at > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > >> >> Method) > > >> >> at > > >> >> > > >> > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > >> >> at > > >> >> > > >> > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > >> >> ... > > >> >> 2009-11-12 15:03:52,682 DEBUG org.apache.zookeeper.ClientCnxn: Got > > >> >> ping response for sessionid:0x224e55436ad0004 after 0ms > > >> >> 2009-11-12 15:04:52,480 WARN org.apache.zookeeper.ClientCnxn: > > >> >> Exception closing session 0x24e55436a0007d to > > >> >> sun.nio.ch.SelectionKeyImpl@136f6ed1 > > >> >> > > >> >> Goes back in time? I does look better tho. Did you say you set the > > >> >> zookeeper session timeout higher in a previous thread? I'd be > > >> >> interested to see if a higher (2-3 minutes) value would solve the > > >> >> problem until java 7. > > >> >> > > >> >> The only thing that happens after that LeaseStillHeldException is > > that > > >> >> for some reason (probably GC) the region server paused. This caused > > >> >> the session to timeout and should be why the RS isn't connected to > > the > > >> >> ZK quorum member. > > >> >> > > >> >> BTW we worked with the Zookeeper dev team to have more meaningful > > >> >> exceptions and it's already committed in their trunk. > > >> >> > > >> >> J-D > > >> >> > > >> >> On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong < > > zhongresearch@gmail.com> > > >> >> wrote: > > >> >> > Follow up on the regionserver disconnection: > > >> >> > > > >> >> > I made a few changes to the cluster setup. > > >> >> > 1. split the regionserver and zookeeper quorum servers, such that > > >> HRegion > > >> >> > doesn't run on the same box as HQuorumPeer. I assume that there > > should > > >> be > > >> >> no > > >> >> > disk io contention between the regionserver and zookeeper. > > >> >> > Now we have 14 regionservers, and 5 zookeeper quorum servers. > > >> >> > > > >> >> > Though I experienced 2 regionserver disconnection this morning, > it > > >> looks > > >> >> > better from the regionserver log. (please see the following log) > > >> >> > > > >> >> > > > >> >> > http://pastebin.com/m496dbfae > > >> >> > > > >> >> > > > >> >> > I checked diskIO for one of the regionserver(192.168.100.116) > > during > > >> the > > >> >> > time it disconnected. > > >> >> > > > >> >> > Time: 03:04:58 AM > > >> >> > Device: tps Blk_read/s Blk_wrtn/s Blk_read > > Blk_wrtn > > >> >> > sda 105.31 5458.83 19503.64 9043873239 > > >> 32312473676 > > >> >> > sda1 2.90 6.64 99.25 10993934 > > 164433464 > > >> >> > sda2 1.72 23.76 30.25 39365817 > > 50124008 > > >> >> > sda3 0.30 0.38 3.58 624930 > > 5923000 > > >> >> > sda4 100.39 5428.06 19370.56 8992888270 > > >> 32091993204 > > >> >> > > > >> >> > > > >> >> > I also checked the zookeeper quorum server that the regionserver > > tried > > >> to > > >> >> > connect according to the log. However, I don't see > 192.168.100.116 > > in > > >> the > > >> >> > client list of the zookeeper quorum server that the regionserver > > tried > > >> to > > >> >> > connect. > > >> >> > Would that be a problem? > > >> >> > > > >> >> > Thu Nov 12 15:04:52 UTC 2009 > > >> >> > Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT > > >> >> > Clients: > > >> >> > /192.168.100.127:43045[1](queued=0,recved=26,sent=0) > > >> >> > /192.168.100.131:39091[1](queued=0,recved=964,sent=0) > > >> >> > /192.168.100.124:35961[1](queued=0,recved=3266,sent=0) > > >> >> > /192.168.100.123:47935[1](queued=0,recved=233,sent=0) > > >> >> > /192.168.100.125:46931[1](queued=0,recved=2,sent=0) > > >> >> > /192.168.100.118:54924[1](queued=0,recved=295,sent=0) > > >> >> > /192.168.100.118:41390[1](queued=0,recved=2290,sent=0) > > >> >> > /192.168.100.136:42243[1](queued=0,recved=0,sent=0) > > >> >> > > > >> >> > Latency min/avg/max: 0/17/6333 > > >> >> > Received: 47111 > > >> >> > Sent: 0 > > >> >> > Outstanding: 0 > > >> >> > Zxid: 0x77000083f4 > > >> >> > Mode: leader > > >> >> > Node count: 23 > > >> >> > > > >> >> > Best, > > >> >> > zhenyu > > >> >> > > > >> >> > > > >> >> > On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong < > > >> zhongresearch@gmail.com > > >> >> >wrote: > > >> >> > > > >> >> >> Stack > > >> >> >> > > >> >> >> I am very appreciated for your comments. > > >> >> >> I will use the zookeeper monitoring script on my cluster and let > > it > > >> run > > >> >> >> overnight to see the result. > > >> >> >> I will follow up the thread when I get anything. > > >> >> >> > > >> >> >> thanks > > >> >> >> zhenyu > > >> >> >> > > >> >> >> > > >> >> >> On Wed, Nov 11, 2009 at 3:52 PM, stack > wrote: > > >> >> >> > > >> >> >>> I see these in your log too: > > >> >> >>> > > >> >> >>> > > >> >> >>> 1. 2009-11-11 04:27:19,018 DEBUG > > org.apache.zookeeper.ClientCnxn: > > >> Got > > >> >> >>> ping response for sessionid:0x424dfd908c50009 after 4544ms > > >> >> >>> 2. 2009-11-11 04:27:19,018 DEBUG > > org.apache.zookeeper.ClientCnxn: > > >> Got > > >> >> >>> ping response for sessionid:0x24dfd90c810002 after 4548ms > > >> >> >>> 3. 2009-11-11 04:27:43,960 DEBUG > > org.apache.zookeeper.ClientCnxn: > > >> Got > > >> >> >>> ping response for sessionid:0x424dfd908c50009 after 9030ms > > >> >> >>> 4. 2009-11-11 04:27:43,960 DEBUG > > org.apache.zookeeper.ClientCnxn: > > >> Got > > >> >> >>> ping response for sessionid:0x24dfd90c810002 after 9030ms > > >> >> >>> 5. 2009-11-11 04:27:48,830 DEBUG > > org.apache.zookeeper.ClientCnxn: > > >> Got > > >> >> >>> ping response for sessionid:0x424dfd908c50009 after 0ms > > >> >> >>> 6. 2009-11-11 04:27:48,830 DEBUG > > org.apache.zookeeper.ClientCnxn: > > >> Got > > >> >> >>> ping response for sessionid:0x24dfd90c810002 after 0ms > > >> >> >>> > > >> >> >>> Which would indicate lag talking to zk too. > > >> >> >>> > > >> >> >>> Appending HLog also seems to get steadily worse. > > >> >> >>> > > >> >> >>> I also see these: > > >> >> >>> > > >> >> >>> > > >> >> >>> 1. 2009-11-11 04:27:01,233 WARN > > org.apache.zookeeper.ClientCnxn: > > >> >> >>> Exception closing session 0x24dfd90c810002 to > > >> >> >>> sun.nio.ch.SelectionKeyImpl@5e16a7e3 > > >> >> >>> 2. java.io.IOException: Read error rc = -1 > > >> >> >>> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] > > >> >> >>> 3. at > > >> >> >>> > > >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) > > >> >> >>> 4. at > > >> >> >>> > > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > > >> >> >>> > > >> >> >>> > > >> >> >>> Were all 5 quorum members participating? Check by doing "echo > > >> stat|nc > > >> >> >>> QUORUM_HOST port" for each member of quorum.... the below might > > >> help: > > >> >> >>> > > >> >> >>> members="1,2,3,4,5" > > >> >> >>> while true; do date; for i in $members; do echo $i; echo stat | > > nc > > >> $i > > >> >> >>> 2181; > > >> >> >>> done; sleep 10; done > > >> >> >>> > > >> >> >>> We need to improve our zk_dump. It just reports quorum > > members... > > >> not > > >> >> if > > >> >> >>> they are actually participating or not. The above will tell > you > > if > > >> >> dead > > >> >> >>> or > > >> >> >>> alive. > > >> >> >>> > > >> >> >>> Please report back after checking above and following the > > >> Jean-Daniel > > >> >> >>> prescription. > > >> >> >>> > > >> >> >>> Thanks, > > >> >> >>> St.Ack > > >> >> >>> > > >> >> >>> > > >> >> >>> > > >> >> >>> On Wed, Nov 11, 2009 at 12:29 PM, Zhenyu Zhong < > > >> >> zhongresearch@gmail.com > > >> >> >>> >wrote: > > >> >> >>> > > >> >> >>> > Thanks a lot for the help. > > >> >> >>> > > > >> >> >>> > I understand that the change of disk configurations should be > > done > > >> >> >>> > progressively such that HDFS can balance and recover from a > > disk > > >> >> going > > >> >> >>> > away. > > >> >> >>> > > > >> >> >>> > The data we are trying to import into HBase is about > 200million > > >> rows, > > >> >> >>> each > > >> >> >>> > row is about hundreds of bytes. Basically, we read in a text > > file > > >> >> from > > >> >> >>> > HDFS, > > >> >> >>> > then for each row in the text file, we fetch data from HBase > > >> table, > > >> >> we > > >> >> >>> > compare the column value and update the latest value back > into > > >> HBase. > > >> >> I > > >> >> >>> was > > >> >> >>> > suggested to use CheckAndPut. Anyway this is the common > > usecase > > >> on > > >> >> my > > >> >> >>> > side. > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > Many thanks > > >> >> >>> > zhenyu > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > > >> >> >>> > On Wed, Nov 11, 2009 at 3:03 PM, Jean-Daniel Cryans < > > >> >> >>> jdcryans@apache.org > > >> >> >>> > >wrote: > > >> >> >>> > > > >> >> >>> > > 6 disks, 19 machines and you see those kind of issues? I'm > > quite > > >> >> >>> > > surprised and I want to learn more about what you trying to > > to. > > >> >> >>> > > > > >> >> >>> > > But first, if hbase is managing ZK for you then you can > > change > > >> the > > >> >> >>> > > value of hbase.zookeeper.property.dataDir so that it points > > to > > >> its > > >> >> own > > >> >> >>> > > mount. Make sure you also change dfs.data.dir in > > >> conf/hdfs-site.xml > > >> >> >>> > > (hadoop stuff) so that it stops pointing there (you should > > have > > >> 6 > > >> >> >>> > > values there at the moment if you configured it correctly). > > >> Doing > > >> >> this > > >> >> >>> > > could result in data loss if you do it all at once so do > the > > >> change > > >> >> >>> > > progressively. This also means that you may have to keep a > > >> seperate > > >> >> >>> > > HDFS configuration for those particular nodes. Please ask > > more > > >> >> >>> > > questions if it's not very clear. > > >> >> >>> > > > > >> >> >>> > > WRT to dedicating nodes to Zookeeeper, it makes sense only > if > > >> you > > >> >> have > > >> >> >>> > > a lot of nodes but in your case, unless there's also CPU > > >> >> contention, > > >> >> >>> > > you're good with only a dedicated disk. > > >> >> >>> > > > > >> >> >>> > > Now for what you are trying to do, is this a massive import > > that > > >> >> goes > > >> >> >>> > > over many hours? How big is each row? I'd love to get a lot > > of > > >> >> details > > >> >> >>> > > to help you optimize your cluster for your particular > > problem. > > >> >> >>> > > > > >> >> >>> > > Thx, > > >> >> >>> > > > > >> >> >>> > > J-D > > >> >> >>> > > > > >> >> >>> > > On Wed, Nov 11, 2009 at 11:51 AM, Zhenyu Zhong < > > >> >> >>> zhongresearch@gmail.com> > > >> >> >>> > > wrote: > > >> >> >>> > > > J-D, > > >> >> >>> > > > > > >> >> >>> > > > Thanks a lot for the insightful comments. > > >> >> >>> > > > We are running 5 zookeeper quorums now. On these 5 quorum > > >> >> servers, > > >> >> >>> we > > >> >> >>> > > also > > >> >> >>> > > > run datanode, tasktracker and regionserver. > > >> >> >>> > > > Now we have 6 disks per machine. > > >> >> >>> > > > If it is the disk IO saturation, is there anyway to > > configure > > >> >> >>> zookeeper > > >> >> >>> > > to > > >> >> >>> > > > write to a seperate disk which is not the same one used > by > > >> HDFS? > > >> >> >>> > > > Or shall we run zookeeper quorums on different nodes > other > > >> than > > >> >> >>> HDFS? > > >> >> >>> > > > > > >> >> >>> > > > Are HLogs generated by Regionservers? Would it alleviate > > the > > >> >> issue > > >> >> >>> if > > >> >> >>> > we > > >> >> >>> > > use > > >> >> >>> > > > the first 14 for regionserver, and the rest of 5 for > > >> >> zookeeper?(In > > >> >> >>> that > > >> >> >>> > > > sense, the node running zookeeper quorum won't run > > >> regionserver) > > >> >> >>> > > > > > >> >> >>> > > > I am very appreciated for your comments. > > >> >> >>> > > > > > >> >> >>> > > > Best, > > >> >> >>> > > > zhenyu > > >> >> >>> > > > > > >> >> >>> > > > > > >> >> >>> > > > > > >> >> >>> > > > On Wed, Nov 11, 2009 at 2:30 PM, Jean-Daniel Cryans < > > >> >> >>> > jdcryans@apache.org > > >> >> >>> > > >wrote: > > >> >> >>> > > > > > >> >> >>> > > >> The problem seems to be that the IO is saturated. When > you > > >> see > > >> >> >>> this: > > >> >> >>> > > >> > > >> >> >>> > > >> # > > >> >> >>> > > >> 2009-11-11 04:28:10,308 WARN > > >> >> >>> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server > > handler > > >> 9 > > >> >> on > > >> >> >>> > > >> 60021 took 6819ms appending an edit to hlog; > editcount=370 > > >> >> >>> > > >> # > > >> >> >>> > > >> 2009-11-11 04:28:10,308 WARN > > >> >> >>> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server > > handler > > >> 2 > > >> >> on > > >> >> >>> > > >> 60021 took 4412ms appending an edit to hlog; > editcount=371 > > >> >> >>> > > >> > > >> >> >>> > > >> It's a sign that HDFS is slow. I guess that the > zookeeper > > >> quorum > > >> >> >>> > > >> members are on those same nodes using the same disks? If > > so, > > >> >> >>> there's > > >> >> >>> > > >> the problem because Zookeeper needs to be able to commit > > >> >> >>> transactions > > >> >> >>> > > >> very fast and if your disks are already hogged then they > > have > > >> to > > >> >> >>> wait > > >> >> >>> > > >> and you can get disconnections. Do you still have 19 > > quorum > > >> >> >>> members? > > >> >> >>> > > >> How many disks do you have per machine? > > >> >> >>> > > >> > > >> >> >>> > > >> Thx > > >> >> >>> > > >> > > >> >> >>> > > >> J-D > > >> >> >>> > > >> > > >> >> >>> > > >> On Wed, Nov 11, 2009 at 11:22 AM, Zhenyu Zhong < > > >> >> >>> > zhongresearch@gmail.com > > >> >> >>> > > > > > >> >> >>> > > >> wrote: > > >> >> >>> > > >> > Please take a look at the following regionserver log. > > >> >> >>> > > >> > The error started to come out from 4:24am UTC to > 4:29am > > >> UTC. > > >> >> >>> > > >> > > > >> >> >>> > > >> > http://pastebin.com/m3717d064 > > >> >> >>> > > >> > > > >> >> >>> > > >> > Thanks in advance. > > >> >> >>> > > >> > > > >> >> >>> > > >> > Best, > > >> >> >>> > > >> > zhenyu > > >> >> >>> > > >> > > > >> >> >>> > > >> > > > >> >> >>> > > >> > On Wed, Nov 11, 2009 at 2:01 PM, Jean-Daniel Cryans < > > >> >> >>> > > jdcryans@apache.org > > >> >> >>> > > >> >wrote: > > >> >> >>> > > >> > > > >> >> >>> > > >> >> Could you paste a bigger log trace (interested to see > > at > > >> >> least > > >> >> >>> all > > >> >> >>> > > 300 > > >> >> >>> > > >> >> lines before that) in a pastebin? > > >> >> >>> > > >> >> > > >> >> >>> > > >> >> Thx, > > >> >> >>> > > >> >> > > >> >> >>> > > >> >> J-D > > >> >> >>> > > >> >> > > >> >> >>> > > >> >> On Wed, Nov 11, 2009 at 10:56 AM, Zhenyu Zhong < > > >> >> >>> > > zhongresearch@gmail.com > > >> >> >>> > > >> > > > >> >> >>> > > >> >> wrote: > > >> >> >>> > > >> >> > J-D, > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > Thanks for the reply. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > The problem I have is one or more of the > > regionservers > > >> >> >>> disconnect > > >> >> >>> > > from > > >> >> >>> > > >> >> the > > >> >> >>> > > >> >> > cluster. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > I checked the resource usage including memory Heap > > usage > > >> >> for > > >> >> >>> > > >> >> regionserver, > > >> >> >>> > > >> >> > and found nothing suspicious. The regionserver heap > > >> usage > > >> >> is > > >> >> >>> way > > >> >> >>> > > below > > >> >> >>> > > >> >> the > > >> >> >>> > > >> >> > maxHeap defined in jvm. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > So I checked zookeeper log, then I found the > > following > > >> >> >>> > > >> >> > BufferUnderflowException, eventually it leads to > > >> >> >>> SocketException: > > >> >> >>> > > >> >> Transport > > >> >> >>> > > >> >> > endpoint is not connected. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > I also checked the regionserver log. It looks like > > the > > >> >> >>> > regionserver > > >> >> >>> > > >> tried > > >> >> >>> > > >> >> to > > >> >> >>> > > >> >> > connect to other zookeeper quorum server. It > connects > > >> >> >>> > successfully, > > >> >> >>> > > >> but > > >> >> >>> > > >> >> then > > >> >> >>> > > >> >> > it complained about SocketException: transport > > endpoint > > >> is > > >> >> not > > >> >> >>> > > >> connected. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > So both zookeeper log and regionserver log told me > > that > > >> the > > >> >> >>> > > transport > > >> >> >>> > > >> >> > endpoint is not connected. I don't know why. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > I would be very appreciated if you could point me > to > > the > > >> >> right > > >> >> >>> > > >> direction. > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > Best, > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > Regionserver log: > > >> >> >>> > > >> >> > 2009-11-11 04:24:53,718 INFO > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > >> Attempting > > >> >> >>> > > >> >> > connection to server superpyxis0018/ > > >> 192.168.100.132:2181 > > >> >> >>> > > >> >> > 2009-11-11 04:24:53,718 INFO > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > Priming > > >> >> >>> > > >> >> > connection to > > java.nio.channels.SocketChannel[connected > > >> >> >>> local=/ > > >> >> >>> > > >> >> > 192.168.100.131:43065 remote=superpyxis0018/ > > >> >> >>> 192.168.100.132:2181 > > >> >> >>> > ] > > >> >> >>> > > >> >> > 2009-11-11 04:24:53,718 INFO > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > Server > > >> >> >>> > > >> >> > connection successful > > >> >> >>> > > >> >> > 2009-11-11 04:24:53,768 INFO > > >> >> >>> > > >> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: > > Got > > >> >> >>> ZooKeeper > > >> >> >>> > > >> event, > > >> >> >>> > > >> >> > state: Disconnected, type: None, path: null > > >> >> >>> > > >> >> > 2009-11-11 04:24:54,118 INFO > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > >> Attempting > > >> >> >>> > > >> >> > connection to server superpyxis0017/ > > >> 192.168.100.131:2181 > > >> >> >>> > > >> >> > 2009-11-11 04:24:54,118 INFO > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > Priming > > >> >> >>> > > >> >> > connection to > > java.nio.channels.SocketChannel[connected > > >> >> >>> local=/ > > >> >> >>> > > >> >> > 192.168.100.131:46903 remote=superpyxis0017/ > > >> >> >>> 192.168.100.131:2181 > > >> >> >>> > ] > > >> >> >>> > > >> >> > 2009-11-11 04:24:54,118 INFO > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > Server > > >> >> >>> > > >> >> > connection successful > > >> >> >>> > > >> >> > 2009-11-11 04:24:54,118 WARN > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > >> Exception > > >> >> >>> > > >> >> > closing session 0x24dfd90c810002 to > > >> >> >>> > > >> sun.nio.ch.SelectionKeyImpl@2dd78ccf > > >> >> >>> > > >> >> > java.io.IOException: Read error rc = -1 > > >> >> >>> > > >> java.nio.DirectByteBuffer[pos=0 > > >> >> >>> > > >> >> > lim=4 cap=4] > > >> >> >>> > > >> >> > at > > >> >> >>> > > >> >> > > > >> >> >>> > > > > >> >> > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) > > >> >> >>> > > >> >> > at > > >> >> >>> > > >> >> > > > >> >> >>> > > > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > > >> >> >>> > > >> >> > 2009-11-11 04:24:54,118 WARN > > >> >> org.apache.zookeeper.ClientCnxn: > > >> >> >>> > > Ignoring > > >> >> >>> > > >> >> > exception during shutdown input > > >> >> >>> > > >> >> > java.net.SocketException: Transport endpoint is not > > >> >> connected > > >> >> >>> > > >> >> > at > > sun.nio.ch.SocketChannelImpl.shutdown(Native > > >> >> Method) > > >> >> >>> > > >> >> > at > > >> >> >>> > > >> >> > > > >> >> >>> > > > > >> >> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) > > >> >> >>> > > >> >> > at > > >> >> >>> > > >> > > >> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) > > >> >> >>> > > >> >> > at > > >> >> >>> > > >> >> > > > >> >> >>> > > >> > > >> >> >>> > > > >> >> > > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) > > >> >> >>> > > >> >> > at > > >> >> >>> > > >> >> > > > >> >> >>> > > > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > On Wed, Nov 11, 2009 at 1:40 PM, Jean-Daniel Cryans > < > > >> >> >>> > > >> jdcryans@apache.org > > >> >> >>> > > >> >> >wrote: > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> >> I'm really not sure exactly what the situation is > > with > > >> the > > >> >> >>> > > >> >> >> java.nio.BufferUnderflowException, but what do you > > mean > > >> by > > >> >> >>> > > >> >> >> "regionserver disconnection"? Could you give more > > >> details? > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> >> Thx, > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> >> J-D > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> >> On Wed, Nov 11, 2009 at 10:34 AM, Zhenyu Zhong < > > >> >> >>> > > >> zhongresearch@gmail.com > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> >> wrote: > > >> >> >>> > > >> >> >> > I searched around and found HBASE-1645 "HBase > > could > > >> be a > > >> >> >>> > little > > >> >> >>> > > >> more > > >> >> >>> > > >> >> >> dainty > > >> >> >>> > > >> >> >> > when zk is pulled out from under it" > > >> >> >>> > > >> >> >> > > > >> >> >>> > > >> >> >> > I am wondering whether my problem is related to > > this > > >> >> >>> > HBase-1645. > > >> >> >>> > > Or > > >> >> >>> > > >> >> >> probably > > >> >> >>> > > >> >> >> > has nothing to do with it? > > >> >> >>> > > >> >> >> > > > >> >> >>> > > >> >> >> > thanks > > >> >> >>> > > >> >> >> > > > >> >> >>> > > >> >> >> > > > >> >> >>> > > >> >> >> > On Wed, Nov 11, 2009 at 12:02 PM, Zhenyu Zhong < > > >> >> >>> > > >> >> zhongresearch@gmail.com > > >> >> >>> > > >> >> >> >wrote: > > >> >> >>> > > >> >> >> > > > >> >> >>> > > >> >> >> >> Hi, > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> >> I have been monitoring the system resources > such > > as > > >> >> memory > > >> >> >>> > Heap > > >> >> >>> > > >> usage > > >> >> >>> > > >> >> >> for > > >> >> >>> > > >> >> >> >> Regionserver, load average, network bandwidth, > > etc. > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> >> I experienced a disconnection last night. When > I > > >> >> checked > > >> >> >>> the > > >> >> >>> > > >> memory > > >> >> >>> > > >> >> Heap > > >> >> >>> > > >> >> >> >> for Regionserver, it was 2336 which is way > below > > the > > >> >> max > > >> >> >>> Heap > > >> >> >>> > > >> 4096. I > > >> >> >>> > > >> >> >> don't > > >> >> >>> > > >> >> >> >> think it is Regionserver problem. The requests > > that > > >> the > > >> >> >>> > > >> regionserver > > >> >> >>> > > >> >> got > > >> >> >>> > > >> >> >> was > > >> >> >>> > > >> >> >> >> about 126 which doesn't seem to be high. The > load > > >> >> average > > >> >> >>> is > > >> >> >>> > > kind > > >> >> >>> > > >> of > > >> >> >>> > > >> >> >> high > > >> >> >>> > > >> >> >> >> >100. (but I saw disconnections while the load > > >> average > > >> >> is > > >> >> >>> > > around > > >> >> >>> > > >> 30). > > >> >> >>> > > >> >> >> The > > >> >> >>> > > >> >> >> >> bandwidth is not a problem. > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> >> However, I found error messages from the > > zookeeper. > > >> >> Please > > >> >> >>> > see > > >> >> >>> > > the > > >> >> >>> > > >> >> >> >> following error messages. There are some > > >> >> >>> > > >> >> >> >> BufferUnderflowException error, IOException > > >> error,etc. > > >> >> >>> > > >> >> >> >> May I ask what might be the causes that > generate > > >> these > > >> >> >>> error > > >> >> >>> > > >> >> messages? > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> >> Many thanks > > >> >> >>> > > >> >> >> >> zhenyu > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN > > >> >> >>> > > org.apache.zookeeper.server.Request: > > >> >> >>> > > >> >> >> Ignoring > > >> >> >>> > > >> >> >> >> exception during toString > > >> >> >>> > > >> >> >> >> java.nio.BufferUnderflowException > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > > >> >> >>> > > >> >> >> >> at > > >> java.nio.ByteBuffer.get(ByteBuffer.java:675) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> > > >> >> >>> org.apache.zookeeper.server.Request.toString(Request.java:199) > > >> >> >>> > > >> >> >> >> at > > >> java.lang.String.valueOf(String.java:2827) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > java.lang.StringBuilder.append(StringBuilder.java:115) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.quorum.CommitProcessor.processRequest(CommitProcessor.java:167) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:68) > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN > > >> >> >>> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: > > >> >> >>> > > >> >> >> >> Exception causing close of session > > 0x424dfd908c50009 > > >> >> >>> > > >> >> >> >> due to java.io.IOException: Read error > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG > > >> >> >>> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: > > >> >> >>> > > >> >> >> >> IOException stack trace > > >> >> >>> > > >> >> >> >> java.io.IOException: Read error > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > > > >> >> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:372) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221) > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG > > >> >> >>> > > >> >> >> >> > > org.apache.zookeeper.server.quorum.CommitProcessor: > > >> >> >>> > Processing > > >> >> >>> > > >> >> request:: > > >> >> >>> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches > > >> >> >>> > > >> cxid:0xfffffffffffffff8 > > >> >> >>> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN > > >> >> >>> > > org.apache.zookeeper.server.Request: > > >> >> >>> > > >> >> >> Ignoring > > >> >> >>> > > >> >> >> >> exception during toString > > >> >> >>> > > >> >> >> >> java.nio.BufferUnderflowException > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > > >> >> >>> > > >> >> >> >> at > > >> java.nio.ByteBuffer.get(ByteBuffer.java:675) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> > > >> >> >>> org.apache.zookeeper.server.Request.toString(Request.java:199) > > >> >> >>> > > >> >> >> >> at > > >> java.lang.String.valueOf(String.java:2827) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > java.lang.StringBuilder.append(StringBuilder.java:115) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG > > >> >> >>> > > >> >> >> >> > > org.apache.zookeeper.server.FinalRequestProcessor: > > >> >> >>> Processing > > >> >> >>> > > >> >> request:: > > >> >> >>> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches > > >> >> >>> > > >> cxid:0xfffffffffffffff8 > > >> >> >>> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN > > >> >> >>> > > org.apache.zookeeper.server.Request: > > >> >> >>> > > >> >> >> Ignoring > > >> >> >>> > > >> >> >> >> exception during toString > > >> >> >>> > > >> >> >> >> java.nio.BufferUnderflowException > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) > > >> >> >>> > > >> >> >> >> at > > >> java.nio.ByteBuffer.get(ByteBuffer.java:675) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> > > >> >> >>> org.apache.zookeeper.server.Request.toString(Request.java:199) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> > > >> >> >>> > > >> org.apache.log4j.or.DefaultRenderer.doRender(DefaultRenderer.java:36) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > > > >> org.apache.log4j.or.RendererMap.findAndRender(RendererMap.java:80) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.log4j.spi.LoggingEvent.getRenderedMessage(LoggingEvent.java:362) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.log4j.helpers.PatternParser$BasicPatternConverter.convert(PatternParser.java:403) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> > > >> org.apache.log4j.PatternLayout.format(PatternLayout.java:502) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > > > >> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:359) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> > > >> >> >>> org.apache.log4j.WriterAppender.append(WriterAppender.java:160) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> > > >> >> >>> > > >> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> > org.apache.log4j.Category.callAppenders(Category.java:206) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > org.apache.log4j.Category.forcedLog(Category.java:391) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> org.apache.log4j.Category.debug(Category.java:260) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:147) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN > > >> >> >>> > > org.apache.zookeeper.server.Request: > > >> >> >>> > > >> >> >> Ignoring > > >> >> >>> > > >> >> >> >> exception during toString > > >> >> >>> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches > > >> >> >>> > > >> cxid:0xfffffffffffffff8 > > >> >> >>> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 INFO > > >> >> >>> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: > > >> >> >>> > > >> >> >> >> closing session:0x424dfd908c50009 > NIOServerCnxn: > > >> >> >>> > > >> >> >> >> java.nio.channels.SocketChannel[connected > local=/ > > >> >> >>> > > >> >> 192.168.100.131:2181 > > >> >> >>> > > >> >> >> remote=/ > > >> >> >>> > > >> >> >> >> 192.168.100.131:35500] > > >> >> >>> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG > > >> >> >>> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: > > >> >> >>> > > >> >> >> >> ignoring exception during input shutdown > > >> >> >>> > > >> >> >> >> java.net.SocketException: Transport endpoint is > > not > > >> >> >>> connected > > >> >> >>> > > >> >> >> >> at > > >> sun.nio.ch.SocketChannelImpl.shutdown(Native > > >> >> >>> > Method) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > > > >> >> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> > > >> >> >>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > > > >> >> >>> > > >> > org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:848) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > > > >> >> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:498) > > >> >> >>> > > >> >> >> >> at > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221) > > >> >> >>> > > >> >> >> >> > > >> >> >>> > > >> >> >> > > > >> >> >>> > > >> >> >> > > >> >> >>> > > >> >> > > > >> >> >>> > > >> >> > > >> >> >>> > > >> > > > >> >> >>> > > >> > > >> >> >>> > > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > >> >> >> > > >> >> >> > > >> >> > > > >> >> > > >> > > > >> > > > > > > --000e0cd5ce4402003e047831c0ed--