hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <saint....@gmail.com>
Subject Re: regionserver disconnection
Date Fri, 13 Nov 2009 21:33:39 GMT
You upped the ulimit file descriptors as per the getting started doc?



On Nov 13, 2009, at 1:26 PM, Zhenyu Zhong <zhongresearch@gmail.com>  
wrote:

> Thanks a lot.
>
> Bad news is my kernel is still 2.6.26.
> But it was not a problem before.
>
> Very strange.
>
> zhenyu
>
> On Fri, Nov 13, 2009 at 4:16 PM, Jean-Daniel Cryans <jdcryans@apache.org 
> >wrote:
>
>> Looks like
>> http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/
>>
>> J-D
>>
>> On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong <zhongresearch@gmail.com 
>> >
>> wrote:
>>> Hi,
>>>
>>>
>>> After I re-organize the cluster, the experiment ran into problem  
>>> faster
>> than
>>> before.
>>>
>>> Basically the changes are to use machines with less resources as
>> zookeeper
>>> quorums and machines with more resources as regionserver.
>>>
>>> From the log, I still see the pause around 1 minute.
>>> I enabled the GC logging,  please see
>>>
>>> http://pastebin.com/m1d4ce0f1
>>>
>>> for details.
>>> In general I don't see many pauses in the GC.
>>>
>>> What is more interesting, 1 hour after the 1st regionserver  
>>> disconnected,
>>> the master log complained about too many open files. This didn't  
>>> happen
>>> before.
>>> I checked the system OS setup as well as the limits.conf. I also  
>>> checked
>> the
>>> running processes. The total open files don't reach the limit. I am
>> confused
>>> a bit.
>>>
>>>
>>> Please see the following master log.
>>>
>>> 2009-11-13 20:06:02,114 INFO  
>>> org.apache.hadoop.hbase.master.BaseScanner:
>>> RegionManager.metaScanner scan of 4658 row(s) of meta region  
>>> {server:
>>> 192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete
>>> 2009-11-13 20:06:02,114 INFO  
>>> org.apache.hadoop.hbase.master.BaseScanner:
>> All
>>> 1 .META. region(s) scanned
>>> 2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got  
>>> ping
>>> response for sessionid:0x424eebf1c10004c after 3ms
>>> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.io.IOException: Bad connect ack with
>>> firstBadLink 192.168.100.123:50010
>>> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_-2808245019291145247_5478039
>>> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.io.EOFException
>>> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_1074853606841896259_5478048
>>> 2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got  
>>> ping
>>> response for sessionid:0x24eebf1043003c after 1ms
>>> 2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got  
>>> ping
>>> response for sessionid:0x424eebf1c10004c after 0ms
>>> 2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got  
>>> ping
>>> response for sessionid:0x24eebf1043003c after 0ms
>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.io.IOException: Bad connect ack with
>>> firstBadLink 192.168.100.123:50010
>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_-6642544517082142289_5478063
>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.net.SocketException: Too many open  
>>> files
>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_2057511041109796090_5478063
>>> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.net.SocketException: Too many open  
>>> files
>>> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_8219260302213892894_5478064
>>> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.net.SocketException: Too many open  
>>> files
>>> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_1669205542853067709_5478235
>>> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.net.SocketException: Too many open  
>>> files
>>> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_9128897691346270351_5478237
>>> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.net.SocketException: Too many open  
>>> files
>>> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_1116845144864123018_5478240
>>> 2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got  
>>> ping
>>> response for sessionid:0x424eebf1c10004c after 0ms
>>> 2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got  
>>> ping
>>> response for sessionid:0x24eebf1043003c after 0ms
>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer
>>> Exception: java.io.IOException: Too many open files
>>>       at sun.nio.ch.IOUtil.initPipe(Native Method)
>>>       at sun.nio.ch.EPollSelectorImpl.<init> 
>>> (EPollSelectorImpl.java:49)
>>>       at
>>>
>> sun.nio.ch.EPollSelectorProvider.openSelector 
>> (EPollSelectorProvider.java:18)
>>>       at
>>>
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get 
>> (SocketIOWithTimeout.java:407)
>>>       at
>>>
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select 
>> (SocketIOWithTimeout.java:322)
>>>       at
>>>
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO 
>> (SocketIOWithTimeout.java:157)
>>>       at
>>>
>> org.apache.hadoop.net.SocketOutputStream.write 
>> (SocketOutputStream.java:146)
>>>       at
>>>
>> org.apache.hadoop.net.SocketOutputStream.write 
>> (SocketOutputStream.java:107)
>>>       at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>       at
>>>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run 
>> (DFSClient.java:2290)
>>>
>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
>>> Recovery for block blk_8148813491785406356_5478475 bad datanode[0]
>>> 192.168.100.123:50010
>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
>>> Recovery for block blk_8148813491785406356_5478475 in pipeline
>>> 192.168.100.123:50010, 192.168.100.134:50010,  
>>> 192.168.100.122:50010: bad
>>> datanode 192.168.100.123:50010
>>> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Exception
>> in
>>> createBlockOutputStream java.net.SocketException: Too many open  
>>> files
>>> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient:  
>>> Abandoning
>>> block blk_2119727700857186236_5478498
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <zhongresearch@gmail.com
>>> wrote:
>>>
>>>> Will do.
>>>>
>>>> thanks
>>>> zhenyu
>>>>
>>>>
>>>> On Thu, Nov 12, 2009 at 3:33 PM, stack <stack@duboce.net> wrote:
>>>>
>>>>> Enable GC logging too on this next run (see hbase-env.sh).  Lets  
>>>>> try
>> and
>>>>> get
>>>>> to the bottom of whats going on.
>>>>> Thanks,
>>>>> St.Ack
>>>>>
>>>>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong <
>> zhongresearch@gmail.com
>>>>>> wrote:
>>>>>
>>>>>> I can switch the boxes that run zookeeper with the ones that run
>>>>>> regionservers.
>>>>>> I will see the results later.
>>>>>>
>>>>>> FYI. The node does have the 10 minutes zookeeper.session.timeout
>> value
>>>>> in
>>>>>> place.
>>>>>>
>>>>>> thanks
>>>>>> zhenyu
>>>>>>
>>>>>> On Thu, Nov 12, 2009 at 3:21 PM, stack <stack@duboce.net> wrote:
>>>>>>
>>>>>>> On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong <
>>>>> zhongresearch@gmail.com
>>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> In my cluster, half of the cluster have 2 disks 400GB each
per
>>>>> machine,
>>>>>>> and
>>>>>>>> half of the cluster have 6 disks per machine.  Maybe we should
>> run
>>>>>>>> zookeeper
>>>>>>>> on the machines with 2 disks and RS on machines with 6 disks?
>>>>>>>>
>>>>>>>> That would make most sense only in the below, it looks like
the
>> RS
>>>>> that
>>>>>>> had
>>>>>>> issue had 4 disks?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> BTW, the 10 minutes zookeeper.session.timeout has been set
 
>>>>>>>> during
>>>>> the
>>>>>>>> experiment.
>>>>>>>>
>>>>>>>
>>>>>>> And for sure this node had it in place?
>>>>>>> St.Ack
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> zhenyu
>>>>>>>>
>>>>>>>> On Thu, Nov 12, 2009 at 2:08 PM, stack <stack@duboce.net>
 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong <
>>>>>> zhongresearch@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Though I experienced 2 regionserver disconnection
this
>> morning,
>>>>> it
>>>>>>>> looks
>>>>>>>>>> better from the regionserver log. (please see the
following
>> log)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://pastebin.com/m496dbfae
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I checked diskIO for one of the regionserver(192.168.100.116)
>>>>>> during
>>>>>>>> the
>>>>>>>>>> time it disconnected.
>>>>>>>>>>
>>>>>>>>>> Time: 03:04:58 AM
>>>>>>>>>> Device:            tps   Blk_read/s   Blk_wrtn/s
  Blk_read
>>>>>>> Blk_wrtn
>>>>>>>>>> sda             105.31      5458.83     19503.64
9043873239
>>>>>>> 32312473676
>>>>>>>>>> sda1              2.90         6.64        99.25
  10993934
>>>>>>> 164433464
>>>>>>>>>> sda2              1.72        23.76        30.25
  39365817
>>>>>>> 50124008
>>>>>>>>>> sda3              0.30         0.38         3.58
    624930
>>>>>>> 5923000
>>>>>>>>>> sda4            100.39      5428.06     19370.56
8992888270
>>>>>>> 32091993204
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Is this high for you?  20k blocks/second would seem to
be high
>> but
>>>>>> its
>>>>>>>> one
>>>>>>>>> disk only and its not being shared by zk anymore so shouldn't
>>>>> matter?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I also checked the zookeeper quorum server that the
>> regionserver
>>>>>>> tried
>>>>>>>> to
>>>>>>>>>> connect according to the log. However, I don't see
>>>>> 192.168.100.116
>>>>>> in
>>>>>>>> the
>>>>>>>>>> client list of the zookeeper quorum server that the
>> regionserver
>>>>>>> tried
>>>>>>>> to
>>>>>>>>>> connect.
>>>>>>>>>> Would that be a problem?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Is that because the ephemeral node for the regionserver
had
>>>>>> evaporated?
>>>>>>>>> Lost
>>>>>>>>> its lease w/ zk by the time you went to look?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thu Nov 12 15:04:52 UTC 2009
>>>>>>>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009
18:48
>> GMT
>>>>>>>>>> Clients:
>>>>>>>>>> /192.168.100.127:43045[1](queued=0,recved=26,sent=0)
>>>>>>>>>> /192.168.100.131:39091[1](queued=0,recved=964,sent=0)
>>>>>>>>>> /192.168.100.124:35961[1](queued=0,recved=3266,sent=0)
>>>>>>>>>> /192.168.100.123:47935[1](queued=0,recved=233,sent=0)
>>>>>>>>>> /192.168.100.125:46931[1](queued=0,recved=2,sent=0)
>>>>>>>>>> /192.168.100.118:54924[1](queued=0,recved=295,sent=0)
>>>>>>>>>> /192.168.100.118:41390[1](queued=0,recved=2290,sent=0)
>>>>>>>>>> /192.168.100.136:42243[1](queued=0,recved=0,sent=0)
>>>>>>>>>>
>>>>>>>>>> Latency min/avg/max: 0/17/6333
>>>>>>>>>> Received: 47111
>>>>>>>>>> Sent: 0
>>>>>>>>>> Outstanding: 0
>>>>>>>>>> Zxid: 0x77000083f4
>>>>>>>>>> Mode: leader
>>>>>>>>>> Node count: 23
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That 6 second maximum latency is pretty bad but should
be well
>>>>> within
>>>>>>> the
>>>>>>>>> zk
>>>>>>>>> session timeout.
>>>>>>>>>
>>>>>>>>> So, problem is likely on the zk client side of the session;
>> i.e.
>>>>> in
>>>>>> the
>>>>>>>> RS.
>>>>>>>>> You could enable GC logging as J-D suggested to see if
you  
>>>>>>>>> have
>>>>> any
>>>>>> big
>>>>>>>>> pauses, pauses > zk session timeout.
>>>>>>>>>
>>>>>>>>> When the RS went down, it didn't look too heavily loaded:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  1. 2009-11-12 15:04:52,830 INFO
>>>>>>>>>  org.apache.hadoop.hbase.regionserver.HRegionServer:
Dump of
>>>>>> metrics:
>>>>>>>>>  request=1.5166667, regions=322, stores=657, storefiles=631,
>>>>>>>>>  storefileIndexSize=61, memstoreSize=1472, usedHeap=2819,
>>>>>>> maxHeap=4079,
>>>>>>>>>  blockCacheSize=658110960, blockCacheFree=197395984,
>>>>>>>> blockCacheCount=9903,
>>>>>>>>>  blockCacheHitRatio=99
>>>>>>>>>
>>>>>>>>> Its serving a few reads?  The number of store files seems
 
>>>>>>>>> fine.
>>>>> Not
>>>>>>> too
>>>>>>>>> much memory used.
>>>>>>>>>
>>>>>>>>> Looking at the logs, I see the Lease Still Held exception.
>> This
>>>>>>> happens
>>>>>>>>> when the RS does its regular report to the master but
the
>> master
>>>>>> thinks
>>>>>>>> the
>>>>>>>>> RS has since restarted.  It'll think this probably because
it
>>>>> noticed
>>>>>>>> that
>>>>>>>>> the RS's znode in zk had gone away and it considered
the RS
>> dead.
>>>>>>>>>
>>>>>>>>> Looking too at your  logs I see this gap in the zk pinging:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  1. 2009-11-12 15:03:39,325 DEBUG
>>>>> org.apache.zookeeper.ClientCnxn:
>>>>>> Got
>>>>>>>>>  ping response for sessionid:0x224e55436ad0004 after
0ms
>>>>>>>>>  2. 2009-11-12 15:03:43,113 DEBUG
>>>>> org.apache.zookeeper.ClientCnxn:
>>>>>> Got
>>>>>>>>>  ping response for sessionid:0x24e55436a0007d after 0ms
>>>>>>>>>
>>>>>>>>> Where in the lines above it, its reporting about every
ten
>>>>> seconds,
>>>>>>> here
>>>>>>>>> there is a big gap.
>>>>>>>>>
>>>>>>>>> Do you have ganglia or something that will let you look
more
>> into
>>>>>> what
>>>>>>>> was
>>>>>>>>> happening on this machine around this time?  Is the machine
 
>>>>>>>>> OK?
>>>>> It
>>>>>>> looks
>>>>>>>>> lightly loaded and you have your cluster nicely laid
out.
>>>>> Something
>>>>>>> odd
>>>>>>>> is
>>>>>>>>> going on.  What about things like the write speed to
disk?  In
>> the
>>>>>> past
>>>>>>>>> strange issues have been explained by incorrectly set
BIOS
>> which
>>>>> made
>>>>>>>> disks
>>>>>>>>> run at 1/100th of their proper speed.
>>>>>>>>>
>>>>>>>>> St.Ack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> zhenyu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong <
>>>>>>> zhongresearch@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Stack
>>>>>>>>>>>
>>>>>>>>>>> I am very appreciated for your comments.
>>>>>>>>>>> I will use the zookeeper monitoring script on
my cluster
>> and
>>>>> let
>>>>>> it
>>>>>>>> run
>>>>>>>>>>> overnight to see the result.
>>>>>>>>>>> I will follow up the thread when I get anything.
>>>>>>>>>>>
>>>>>>>>>>> thanks
>>>>>>>>>>> zhenyu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 11, 2009 at 3:52 PM, stack <stack@duboce.net>
>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I see these in your log too:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  1. 2009-11-11 04:27:19,018 DEBUG
>>>>>>> org.apache.zookeeper.ClientCnxn:
>>>>>>>>> Got
>>>>>>>>>>>>  ping response for sessionid:0x424dfd908c50009
after
>> 4544ms
>>>>>>>>>>>>  2. 2009-11-11 04:27:19,018 DEBUG
>>>>>>> org.apache.zookeeper.ClientCnxn:
>>>>>>>>> Got
>>>>>>>>>>>>  ping response for sessionid:0x24dfd90c810002
after
>> 4548ms
>>>>>>>>>>>>  3. 2009-11-11 04:27:43,960 DEBUG
>>>>>>> org.apache.zookeeper.ClientCnxn:
>>>>>>>>> Got
>>>>>>>>>>>>  ping response for sessionid:0x424dfd908c50009
after
>> 9030ms
>>>>>>>>>>>>  4. 2009-11-11 04:27:43,960 DEBUG
>>>>>>> org.apache.zookeeper.

Mime
View raw message