Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5F19F1FC for ; Wed, 3 Apr 2013 18:22:55 +0000 (UTC) Received: (qmail 93698 invoked by uid 500); 3 Apr 2013 18:22:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 93579 invoked by uid 500); 3 Apr 2013 18:22:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 93569 invoked by uid 99); 3 Apr 2013 18:22:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Apr 2013 18:22:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pablo@psafe.com designates 187.0.212.22 as permitted sender) Received: from [187.0.212.22] (HELO aherelay01.exch.emailtotal.com.br) (187.0.212.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Apr 2013 18:22:44 +0000 Received: from exchange.emailtotal.com.br (unknown [187.0.212.6]) by aherelay01.exch.emailtotal.com.br (Postfix) with ESMTP id 38CA1183D6 for ; Wed, 3 Apr 2013 15:22:18 -0300 (BRT) Received: from [192.168.1.133] (177.135.131.66) by exchange.emailtotal.com.br (187.0.212.17) with Microsoft SMTP Server (TLS) id 8.3.83.0; Wed, 3 Apr 2013 15:21:42 -0300 Message-ID: <515C7336.9060505@psafe.com> Date: Wed, 3 Apr 2013 15:21:42 -0300 From: Pablo Musa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: "user@hbase.apache.org" Subject: Re: RegionServers Crashing every hour in production env References: <513A0758.3090507@psafe.com> <513A34D0.80507@psafe.com> <513CD815.9070005@psafe.com> <513D092F.1020106@psafe.com> <513F4D08.3040403@psafe.com> In-Reply-To: <513F4D08.3040403@psafe.com> Content-Type: multipart/alternative; boundary="------------050908020206050708080309" X-Virus-Checked: Checked by ClamAV on apache.org --------------050908020206050708080309 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Hello guys, I stopped my research on HBase ZK timeout for while due to other issues I had to do, but I am back. A very weird behavior that I would like your comments is that my RegionServers perform better (less crashes) under heavy load instead of light load. There is, if I let HBase alone with 50 requestsPerSecond along the whole day the crashes are higher than if I run a mapred Job every hour. Another weird thing is the following: RS startTime = Mon Apr 01 13:22:35 BRT 2013 [...]$ grep slept /var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.log 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 45491ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 101271ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[0-9][0-9]' /var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.out * the below report is the above command for each time range. 0.0 - 0.1 secs GCs = 5084 0.1 - 0.5 secs GCs = 9 0.5 - 1.0 secs GCs = 3 1.0 - 010 secs GCs = 0 010 - 100 secs GCs = 0 100 - 1000 secs GCs = 0 So, my script for getting every gc time ("real=... secs") says that there is no gc that took longer than 1 second. However the RS log says twice that the RS slept more than 40 seconds instead of 3. "this is likely due to a long garbage collecting pause", yes this is likely but I dont think it is the case. The machine is a huge machine with 70GB RAM, 32 procs, light load, no swap or iowait. Any ideas? Thanks, Pablo On 03/12/2013 12:43 PM, Pablo Musa wrote: > Guys, > thank you very much for the help. > > Yesterday I spent 14 hours trying to tune the whole cluster. > The cluster is not ready yet needs a lot of tunning, but at least is > working. > > My first big problem was namenode + datanode GC. They were not using > CMS and thus were taking "incremental" time to run. Ii started in 0.01 > ms and > in 20 minutes was taking 150 secs. > After setting CMSGC this time is much smaller taking a maximum of 70 secs, > which is VERY HIGH, but for now does not stop HBase. > > With this issue solved, it was clear that the RS was doing a long pause GC, > taking up to 220 secs. Zookeeper expired the RS and it shutdown. > I tried a lot of different flags configuration (MORE than 20), and could not > get small gcs. Eventually it would take more than 150 secs (zookeeper > timeout) > and shutdown. > > Finally I tried a config that so far, 12 hours, is working with a maximum GC > time of 90 secs. Which of course is a terrible problem since HBase is a > database, but at least the cluster is stable while I can tune it a > little more. > > In my opinion, my biggest problem is to have a few "monster" machines in the > cluster instead of a bunch of commodities machines. I don't know if > there are > a lot companies using this kind of machines inside a hadoop cluster, but > a fast search on google could not find a lot of tunes for big heap GCs. > > I guess my next step will be search for big heap gc tuning. > > Back to some questions ;) > > > You have ganglia or tsdb running? > > I use zabbix for now, and no there is nothing going on when the big > pause happens. > > > When you see the big pause above, can you see anything going on on the > > machine? (swap, iowait, concurrent fat mapreduce job?) > > what are you doing during long GC happened? read or write? if > reading, what > > the block cache size? > > The cpu for the RS process goes to 100% and the logs "pause", until it > gets out. > Ex: [NewPar > > IO and SWAP are normal. There is no MR running, just normal database > load, which is > very low. I am probably doing reads AND writes to the database with > default block > cache size. > One problem in this moment might be the big number of regions (1252) > since I am > using only one RS to be able to track the problem. > > The links and ideas were very helpful. Thank you very much guys. > > I will post my future researches as I find a solution ;) > > If you have more ideas or info (links, flag suggestions, etc.), please > post it :) > > Abs, > Pablo > > On 03/10/2013 11:24 PM, Andrew Purtell wrote: >> Be careful with GC tuning, throwing changes at an application without >> analysis of what is going on with the heap is shooting in the dark. One >> particular good treatment of the subject is here: >> http://java.dzone.com/articles/how-tame-java-gc-pauses >> >> If you have made custom changes to blockcache or memstore configurations, >> back them out until you're sure everything else is ok. >> >> Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor >> for spikes in page scanning or any swap activity. Nothing brings on >> "Juliette" pauses better than a JVM partially swapped out. The Java GC >> starts collection by examining the oldest pages, and those are the first >> pages the OS swaps out... >> >> >> >> On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu wrote: >> >>> Hi Pablo, >>> It'a terrible for a long minor GC. I don't think there are swaping from >>> your vmstat log. >>> but I just suggest you >>> 1) add following JVM options: >>> -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 >>> -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 >>> -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods >>> >>> 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g >>> 3) what are you doing during long GC happened? read or write? if reading, >>> what the block cache size? >>> >>> >>> >>> >>> On Mon, Mar 11, 2013 at 6:41 AM, Stack wrote: >>> >>>> You could increase your zookeeper session timeout to 5 minutes while you >>>> are figuring why these long pauses. >>>> http://hbase.apache.org/book.html#zookeeper.session.timeout >>>> >>>> Above, there is an outage for almost 5 minutes: >>>> >>>>>> We slept 225100ms instead of 3000ms, this is likely due to a long >>>> You have ganglia or tsdb running? When you see the big pause above, can >>>> you see anything going on on the machine? (swap, iowait, concurrent fat >>>> mapreduce job?) >>>> >>>> St.Ack >>>> >>>> >>>> >>>> On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa wrote: >>>> >>>>> Hi Sreepathi, >>>>> they say in the book (or the site), we could try it to see if it is >>>> really >>>>> a timeout error >>>>> or there is something more. But it is not recomended for production >>>>> environments. >>>>> >>>>> I could give it a try if five minutes will ensure to us that the >>> problem >>>>> is the GC or >>>>> elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 >>>>> minutes. >>>>> >>>>> Abs, >>>>> Pablo >>>>> >>>>> >>>>> On 03/10/2013 04:06 PM, Sreepathi wrote: >>>>> >>>>>> Hi Stack/Ted/Pablo, >>>>>> >>>>>> Should we increase the hbase.rpc.timeout property to 5 minutes ( >>> 300000 >>>> ms >>>>>> ) ? >>>>>> >>>>>> Regards, >>>>>> - Sreepathi >>>>>> >>>>>> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa wrote: >>>>>> >>>>>> That combo should be fine. >>>>>>> Great!! >>>>>>> >>>>>>> >>>>>>> If JVM is full GC'ing, the application is stopped. >>>>>>>> The below does not look like a full GC but that is a long pause in >>>>>>>> system >>>>>>>> time, enough to kill your zk session. >>>>>>>> >>>>>>> Exactly. This pause is really making the zk expire the RS which >>>> shutsdown >>>>>>> (logs >>>>>>> in the end of the email). >>>>>>> But the question is: what is causing this pause??!! >>>>>>> >>>>>>> You swapping? >>>>>>> I don't think so (stats below). >>>>>>> >>>>>>> Hardware is good? >>>>>>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk >>>> space. >>>>>>> Below are some metrics I have heard about. Hope it helps. >>>>>>> >>>>>>> >>>>>>> ** I am having some problems with the datanodes[1] which are having >>>>>>> trouble to >>>>>>> write. I really think the issues are related, but cannot solve any of >>>>>>> them >>>>>>> :( >>>>>>> >>>>>>> Thanks again, >>>>>>> Pablo >>>>>>> >>>>>>> [1] http://mail-archives.apache. >>> ****org/mod_mbox/hadoop-hdfs-user/**** >>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr**** >>>>>>> QfVbyQ@mail.gmail.com%3E>>>>>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** >>>>>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**QfVbyQ@mail.gmail.com%3E< >>> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+jGOPUt15PwFjcCSzigE0APeM9FXaCrQfVbyQ@mail.gmail.com%3E >>>>>>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, >>> 2.55, >>>>>>> 1.28 >>>>>>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie >>>>>>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, >>>>>>> 0.0%st >>>>>>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k >>>> buffers >>>>>>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k >>> cached >>>>>>> ]$ vmstat -w >>>>>>> procs -------------------memory-----****------------- ---swap-- >>>>>>> -----io---- >>>>>>> --system-- -----cpu------- >>>>>>> r b swpd free buff cache si so bi bo >>>>>>> in >>>>>>> cs us sy id wa st >>>>>>> 2 0 128312 32416928 5838288 5043560 0 0 202 53 >>>>>>> 0 >>>>>>> 0 2 1 96 1 0 >>>>>>> >>>>>>> ]$ sar >>>>>>> 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 >>>>>>> 70.29 >>>>>>> 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 >>>>>>> 96.04 >>>>>>> 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 >>>>>>> 87.11 >>>>>>> 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 >>>>>>> 97.40 >>>>>>> 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 >>>>>>> 98.76 >>>>>>> 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 >>>>>>> 98.82 >>>>>>> 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 >>>>>>> 98.71 >>>>>>> 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 >>>>>>> 98.81 >>>>>>> 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 >>>>>>> 94.73 >>>>>>> Average: all 3.86 0.00 1.38 0.88 0.00 >>>>>>> 93.87 >>>>>>> >>>>>>> ]$ iostat >>>>>>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 _x86_64_ >>>>>>> (16 CPU) >>>>>>> >>>>>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>>>>> 1.86 0.00 0.96 0.78 0.00 96.41 >>>>>>> >>>>>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>>>>> sda 1.23 20.26 23.53 521533196 605566924 >>>>>>> sdb 6.51 921.55 241.90 23717850730 >>> 6225863488 >>>>>>> sdc 6.22 921.83 236.41 23725181162 >>> 6084471192 >>>>>>> sdd 6.25 925.13 237.26 23810004970 >>> 6106357880 >>>>>>> sde 6.19 913.90 235.60 23521108818 >>> 6063722504 >>>>>>> sdh 6.26 933.08 237.77 24014594546 >>> 6119511376 >>>>>>> sdg 6.18 914.36 235.31 23532747378 >>> 6056257016 >>>>>>> sdf 6.24 923.66 235.33 23772251810 >>> 6056604008 >>>>>>> Some more logging which reinforce that the RS crash is happening >>>> because >>>>>>> of >>>>>>> timeout. However this time the GC log is not accusing a big time. >>>>>>> >>>>>>> #####RS LOG##### >>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.****ClientCnxn: >>>> Client >>>>>>> session timed out, have not heard from server in 257739ms for >>> sessionid >>>>>>> 0x13d3c4bcba6014a, closing socket connection and attempting reconnect >>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.****ClientCnxn: >>>> Client >>>>>>> session timed out, have not heard from server in 226785ms for >>> sessionid >>>>>>> 0x13d3c4bcba60149, closing socket connection and attempting reconnect >>>>>>> 2013-03-10 15:37:46,712 DEBUG org.apache.hadoop.hbase.io.**** >>>>>>> hfile.LruBlockCache: >>>>>>> Stats: total=61.91 MB, free=1.94 GB, max=2 GB, blocks=1254, >>>>>>> accesses=60087, >>>>>>> hits=58811, hitRatio=97.87%, , cachingAccesses=60069, >>>> cachingHits=58811, >>>>>>> cachingHitsRatio=97.90%, , evictions=0, evicted=0, evictedPerRun=NaN >>>>>>> 2013-03-10 15:37:46,712 WARN >>> org.apache.hadoop.hbase.util.****Sleeper: >>>>>>> We >>>>>>> slept 225100ms instead of 3000ms, this is likely due to a long >>> garbage >>>>>>> collecting pause and it's usually bad, see >>>>>>> http://hbase.apache.org/book.**** >>>>>>> html#trouble.rs.runtime.****zkexpired>>>>>> org/book.html#trouble.rs.**runtime.zkexpired< >>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> >>>>>>> 2013-03-10 15:37:46,714 WARN org.apache.hadoop.hdfs.****DFSClient: >>>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>>> BP-43236042-172.17.2.10-****1362490844340:blk_-** >>>>>>> 6834190810033122569_25150229 >>>>>>> >>>>>>> java.io.EOFException: Premature EOF: no length prefix available >>>>>>> at org.apache.hadoop.hdfs.****protocol.HdfsProtoUtil.** >>>>>>> vintPrefixed(HdfsProtoUtil.****java:171) >>>>>>> at org.apache.hadoop.hdfs.****protocol.datatransfer.** >>>>>>> PipelineAck.readFields(****PipelineAck.java:114) >>>>>>> at >>>> org.apache.hadoop.hdfs.****DFSOutputStream$DataStreamer$**** >>>>>>> ResponseProcessor.run(****DFSOutputStream.java:670) >>>>>>> 2013-03-10 15:37:46,716 ERROR org.apache.hadoop.hbase.** >>>>>>> regionserver.HRegionServer: >>>>>>> org.apache.hadoop.hbase.ipc.****CallerDisconnectedException: Aborting >>>>>>> call >>>>>>> get([B@7caf69ed, {"timeRange":[0,****9223372036854775807],"** >>>>>>> totalColumns":1,"cacheBlocks":****true,"families":{"details":[**"** >>>>>>> ALL"]},"maxVersions":1,"row":"****\\x00\\x00\\x00\\x00\\x00\\*** >>>>>>> *x12\\x93@"}), >>>>>>> rpc version=1, client version=29, methodsFingerPrint=1891768260 from >>>>>>> 172.17.1.71:51294 after 0 ms, since caller disconnected >>>>>>> >>>>>>> >>>>>>> #####GC LOG##### >>>>>>> 2716.635: [GC 2716.635: [ParNew: 57570K->746K(59008K), 0.0046530 >>> secs] >>>>>>> 354857K->300891K(1152704K), 0.0047370 secs] [Times: user=0.03 >>> sys=0.00, >>>>>>> real=0.00 secs] >>>>>>> 2789.478: [GC 2789.478: [ParNew: 53226K->1192K(59008K), 0.0041370 >>> secs] >>>>>>> 353371K->301337K(1152704K), 0.0042220 secs] [Times: user=0.04 >>> sys=0.00, >>>>>>> real=0.00 secs] >>>>>>> 2868.435: [GC 2868.435: [ParNew: 53672K->740K(59008K), 0.0041570 >>> secs] >>>>>>> 353817K->300886K(1152704K), 0.0042440 secs] [Times: user=0.03 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 2920.309: [GC 2920.309: [ParNew: 53220K->6202K(59008K), 0.0058440 >>> secs] >>>>>>> 353366K->310443K(1152704K), 0.0059410 secs] [Times: user=0.05 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 2984.239: [GC 2984.239: [ParNew: 58680K->5282K(59008K), 0.0048070 >>> secs] >>>>>>> 362921K->310977K(1152704K), 0.0049180 secs] [Times: user=0.03 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 3050.620: [GC 3050.620: [ParNew: 57524K->5292K(59008K), 0.0055510 >>> secs] >>>>>>> 363219K->313384K(1152704K), 0.0056520 secs] [Times: user=0.04 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 3111.331: [GC 3111.331: [ParNew: 57542K->5637K(59008K), 0.0056900 >>> secs] >>>>>>> 365634K->318079K(1152704K), 0.0057950 secs] [Times: user=0.03 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 3163.510: [GC 3163.510: [ParNew: 58117K->1138K(59008K), 0.0046540 >>> secs] >>>>>>> 370559K->316086K(1152704K), 0.0047470 secs] [Times: user=0.03 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 3217.642: [GC 3217.643: [ParNew: 53618K->1001K(59008K), 0.0038530 >>> secs] >>>>>>> 368566K->315949K(1152704K), 0.0039450 secs] [Times: user=0.03 >>> sys=0.00, >>>>>>> real=0.00 secs] >>>>>>> 3464.067: [GC 3464.067: [ParNew: 53481K->5348K(59008K), 0.0068370 >>> secs] >>>>>>> 368429K->322343K(1152704K), 0.0069170 secs] [Times: user=0.07 >>> sys=0.01, >>>>>>> real=0.00 secs] >>>>>>> 3465.648: [GC 3465.648: [ParNew: 57828K->1609K(59008K), 0.0111520 >>> secs] >>>>>>> 374823K->321496K(1152704K), 0.0112170 secs] [Times: user=0.07 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> 3466.291: [GC 3466.291: [ParNew: 54089K->2186K(59008K), 0.0093190 >>> secs] >>>>>>> 373976K->322073K(1152704K), 0.0093800 secs] [Times: user=0.06 >>> sys=0.00, >>>>>>> real=0.01 secs] >>>>>>> Heap >>>>>>> par new generation total 59008K, used 15497K [0x00000005fae00000, >>>>>>> 0x00000005fee00000, 0x00000005fee00000) >>>>>>> eden space 52480K, 25% used [0x00000005fae00000, >>>> 0x00000005fbaff8c8, >>>>>>> 0x00000005fe140000) >>>>>>> from space 6528K, 33% used [0x00000005fe140000, >>> 0x00000005fe362b70, >>>>>>> 0x00000005fe7a0000) >>>>>>> to space 6528K, 0% used [0x00000005fe7a0000, >>> 0x00000005fe7a0000, >>>>>>> 0x00000005fee00000) >>>>>>> concurrent mark-sweep generation total 1093696K, used 319887K >>>>>>> [0x00000005fee00000, 0x0000000641a10000, 0x00000007fae00000) >>>>>>> concurrent-mark-sweep perm gen total 30464K, used 30427K >>>>>>> [0x00000007fae00000, 0x00000007fcbc0000, 0x0000000800000000) >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 03/08/2013 07:02 PM, Stack wrote: >>>>>>> >>>>>>> On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa >>> wrote: >>>>>>>> 0.94 currently doesn't support hadoop 2.0 >>>>>>>> >>>>>>>>> Can you deploy hadoop 1.1.1 instead ? >>>>>>>>>> I am using cdh4.2.0 which uses this version as default >>>> installation. >>>>>>>>> I think it will be a problem for me to deploy 1.1.1 because I would >>>>>>>>> need >>>>>>>>> to >>>>>>>>> "upgrade" the whole cluster with 70TB of data (backup everything, >>> go >>>>>>>>> offline, etc.). >>>>>>>>> >>>>>>>>> Is there a problem to use cdh4.2.0? >>>>>>>>> I should send my email to cdh list? >>>>>>>>> >>>>>>>>> >>>>>>>>> That combo should be fine. >>>>>>>>> >>>>>>>> You Full GC'ing around this time? >>>>>>>> >>>>>>>>> The GC shows it took a long time. However it does not make any >>> sense >>>>>>>>> to be it, since the same ammount of data was cleaned before and >>> AFTER >>>>>>>>> in just 0.01 secs! >>>>>>>>> >>>>>>>>> >>>>>>>>> If JVM is full GC'ing, the application is stopped. >>>>>>>>> >>>>>>>> [Times: user=0.08 sys=137.62, real=137.62 secs] >>>>>>>> >>>>>>>>> Besides the whole time was used by system. That is what is bugging >>>> me. >>>>>>>>> The below does not look like a full GC but that is a long pause >>> in >>>>>>>> system >>>>>>>> time, enough to kill your zk session. >>>>>>>> >>>>>>>> You swapping? >>>>>>>> >>>>>>>> Hardware is good? >>>>>>>> >>>>>>>> St.Ack >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>>> 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), 0.0040990 >>>> secs] >>>>>>>>> 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03 >>>> sys=0.00, >>>>>>>>> real=0.01 secs] >>>>>>>>> >>>>>>>>> 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000 >>>> secs] >>>>>>>>> 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04 >>>> sys=0.01, >>>>>>>>> real=0.00 secs] >>>>>>>>> >>>>>>>>> 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), 137.6353620 >>>>>>>>> secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times: >>> user=0.08 >>>>>>>>> sys=137.62, real=137.62 secs] >>>>>>>>> >>>>>>>>> 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960 >>>> secs] >>>>>>>>> 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05 >>>> sys=0.00, >>>>>>>>> real=0.01 secs] >>>>>>>>> >>>>>>>>> 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650 >>>> secs] >>>>>>>>> 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06 >>>> sys=0.00, >>>>>>>>> real=0.01 secs] >>>>>>>>> >>>>>>>>> >>>>>>>>> I really appreciate you guys helping me to find out what is wrong. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Pablo >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 03/08/2013 02:11 PM, Stack wrote: >>>>>>>>> >>>>>>>>> What RAM says. >>>>>>>>> >>>>>>>>>> 2013-03-07 17:24:57,887 INFO >>>> org.apache.zookeeper.********ClientCnxn: >>>>>>>>>> Client >>>>>>>>>> >>>>>>>>>> session timed out, have not heard from server in 159348ms for >>>>>>>>>> sessionid >>>>>>>>>> 0x13d3c4bcba600a7, closing socket connection and attempting >>>> reconnect >>>>>>>>>> You Full GC'ing around this time? >>>>>>>>>> >>>>>>>>>> Put up your configs in a place where we can take a look? >>>>>>>>>> >>>>>>>>>> St.Ack >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan < >>>>>>>>>> ramkrishna.s.vasudevan@gmail.******com >>>> >>>>>>>>> **** >>>>>>>>>> com >>> ramkrishna.s.vasudevan@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I think it is with your GC config. What is your heap size? >>> What >>>>>>>>>> is >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> data that you pump in and how much is the block cache size? >>>>>>>>>>> Regards >>>>>>>>>>> Ram >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu >>>> wrote: >>>>>>>>>>> 0.94 currently doesn't support hadoop 2.0 >>>>>>>>>>> >>>>>>>>>>> Can you deploy hadoop 1.1.1 instead ? >>>>>>>>>>>> Are you using 0.94.5 ? >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa >>>> wrote: >>>>>>>>>>>> Hey guys, >>>>>>>>>>>> >>>>>>>>>>>> as I sent in an email a long time ago, the RSs in my cluster >>> did >>>>>>>>>>>>> not >>>>>>>>>>>>> >>>>>>>>>>>>> get >>>>>>>>>>>>> >>>>>>>>>>>> along >>>>>>>>>>>> >>>>>>>>>>>> and crashed 3 times a day. I tried a lot of options we >>> discussed >>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>> emails, but it not solved the problem. As I used an old version >>>> of >>>>>>>>>>>>> hadoop I >>>>>>>>>>>>> >>>>>>>>>>>> thought this was the problem. >>>>>>>>>>>> >>>>>>>>>>>>> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 >>> to >>>>>>>>>>>>> hadoop >>>>>>>>>>>>> >>>>>>>>>>>> 2.0.0 >>>>>>>>>>>> >>>>>>>>>>>> - hbase 0.94 - zookeeper 3.4.5. >>>>>>>>>>>>> Unfortunately the RSs did not stop crashing, and worst! Now >>> they >>>>>>>>>>>>> crash >>>>>>>>>>>>> every >>>>>>>>>>>>> hour and some times when the RS that holds the .ROOT. crashes >>> all >>>>>>>>>>>>> cluster >>>>>>>>>>>>> >>>>>>>>>>>> get >>>>>>>>>>>> >>>>>>>>>>>> stuck in transition and everything stops working. >>>>>>>>>>>>> In this case I need to clean zookeeper znodes, restart the >>> master >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>> RSs. >>>>>>>>>>>> >>>>>>>>>>>> To avoid this case I am running on production with only ONE RS >>>> and >>>>>>>>>>>>> a >>>>>>>>>>>>> monitoring >>>>>>>>>>>>> script that check every minute, if the RS is ok. If not, >>> restart >>>>>>>>>>>>> it. >>>>>>>>>>>>> * This case does not get the cluster stuck. >>>>>>>>>>>>> >>>>>>>>>>>>> This is driving me crazy, but I really cant find a solution for >>>> the >>>>>>>>>>>>> cluster. >>>>>>>>>>>>> I tracked all logs from the start time 16:49 from all >>> interesting >>>>>>>>>>>>> nodes >>>>>>>>>>>>> (zoo, >>>>>>>>>>>>> namenode, master, rs, dn2, dn9, dn10) and copied here what I >>>> think >>>>>>>>>>>>> is >>>>>>>>>>>>> usefull. >>>>>>>>>>>>> >>>>>>>>>>>>> There are some strange errors in the DATANODE2, as an error >>>>>>>>>>>>> copiyng a >>>>>>>>>>>>> >>>>>>>>>>>>> block >>>>>>>>>>>>> >>>>>>>>>>>> to itself. >>>>>>>>>>>> >>>>>>>>>>>>> The gc log points to GC timeout. However it is very weird that >>>> the >>>>>>>>>>>>> RS >>>>>>>>>>>>> >>>>>>>>>>>>> spend >>>>>>>>>>>>> >>>>>>>>>>>> so much time in GC while in the other cases it takes 0.001sec. >>>>>>>>>>>> >>>>>>>>>>>>> Besides, >>>>>>>>>>>>> the time >>>>>>>>>>>>> spent, is in sys which makes me think that might be a problem >>> in >>>>>>>>>>>>> another >>>>>>>>>>>>> >>>>>>>>>>>> place. >>>>>>>>>>>> >>>>>>>>>>>> I know that it is a bunch of logs, and that it is very >>> difficult >>>> to >>>>>>>>>>>>> find >>>>>>>>>>>>> >>>>>>>>>>>> the >>>>>>>>>>>> >>>>>>>>>>>> problem without much context. But I REALLY need some help. If >>> it >>>> is >>>>>>>>>>>>> not >>>>>>>>>>>>> >>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>> solution, at least what I should read, where I should look, or >>>>>>>>>>>> which >>>>>>>>>>>> >>>>>>>>>>>>> cases >>>>>>>>>>>>> >>>>>>>>>>>> I >>>>>>>>>>>> >>>>>>>>>>>>> should monitor. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you very much, >>>>>>>>>>>>> Pablo Musa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>> -- >>>>>> *Regards,* >>>>>> --- *Sreepathi * >>>>>> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) --------------050908020206050708080309--