Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCDB9F767 for ; Wed, 3 Apr 2013 21:41:21 +0000 (UTC) Received: (qmail 28932 invoked by uid 500); 3 Apr 2013 21:41:19 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 28871 invoked by uid 500); 3 Apr 2013 21:41:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 28863 invoked by uid 99); 3 Apr 2013 21:41:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Apr 2013 21:41:19 +0000 X-ASF-Spam-Status: No, hits=2.6 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,TRACKER_ID,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-la0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Apr 2013 21:41:14 +0000 Received: by mail-la0-f52.google.com with SMTP id fs12so1906071lab.11 for ; Wed, 03 Apr 2013 14:40:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=l2W4KrfO3/zbrnYkBTzvz/96LlyOXh3tJQUsDXznovE=; b=DI8SnIGxMVd2U93xSxjY5fj/GxsC5rVmSUc5Ih+w01/ztGLcqK5MfCVqhpveMcLzzi tPjUN1A7LDLtQPw5kWw4RMe5Urc8s1EV3uefIF1VA69oCx1rEwoy1uuYdH672tCg9MN1 hBYnEzMR/DBT4Ne58Fs34TWvPIUXk6poEVWuGExNM+uZSgqMCDnf2pPLOytWFFKFuFhj mk0kQLI/rvBLc8h/RFSM0J1fAYS9Q+q1rIx88ByGn1fgULIdwfxgCook3kikuZ4YPj0/ bErfqMp7Y63zWoWgo6OOesyBxdwMapy4gW/2FgW131glUsu4COMWql8Qi7nK2KKGGppi Wm4Q== MIME-Version: 1.0 X-Received: by 10.112.20.106 with SMTP id m10mr1987664lbe.8.1365025252357; Wed, 03 Apr 2013 14:40:52 -0700 (PDT) Received: by 10.112.84.133 with HTTP; Wed, 3 Apr 2013 14:40:52 -0700 (PDT) In-Reply-To: <515C900D.8050602@psafe.com> References: <513A0758.3090507@psafe.com> <513A34D0.80507@psafe.com> <513CD815.9070005@psafe.com> <513D092F.1020106@psafe.com> <513F4D08.3040403@psafe.com> <515C7336.9060505@psafe.com> <515C900D.8050602@psafe.com> Date: Wed, 3 Apr 2013 14:40:52 -0700 Message-ID: Subject: Re: RegionServers Crashing every hour in production env From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=14dae94734b346bd6a04d97bb51d X-Virus-Checked: Checked by ClamAV on apache.org --14dae94734b346bd6a04d97bb51d Content-Type: text/plain; charset=ISO-8859-1 I went over related emails in my Inbox. One previous case was that other task was running on the region server node which consumed good portion of CPU. In that case I observed a gap of activities in region server log. I can send that snippet, after anonymization since there were some IP addresses at my previous employer. FYI On Wed, Apr 3, 2013 at 1:24 PM, Pablo Musa wrote: > Have you looked at http://hbase.apache.org/book.**html#trouble.rs.runtime. >> **zkexpired, suggested below ? >> > > Yes I did, but GC is not the issue here. > > > I think zookeeper timeout should be more closely watched. >> > > What do you mean? > My ZK timeout today is 150 secs, which is very big. However, my > problem is finding where this timeout come from?! Any ideas? > > Thanks, > Pablo > > > On 04/03/2013 03:36 PM, Ted Yu wrote: > >> Thanks for the sharing. >> >> Have you looked at >> http://hbase.apache.org/book.**html#trouble.rs.runtime.**zkexpired, suggested >> below ? >> >> I think zookeeper timeout should be more closely watched. >> >> >> On Wed, Apr 3, 2013 at 11:21 AM, Pablo Musa wrote: >> >> Hello guys, >>> I stopped my research on HBase ZK timeout for while due to >>> other issues I had to do, but I am back. >>> >>> A very weird behavior that I would like your comments is that my >>> RegionServers perform better (less crashes) under heavy load instead >>> of light load. >>> There is, if I let HBase alone with 50 requestsPerSecond along the >>> whole day the crashes are higher than if I run a mapred Job every hour. >>> >>> >>> Another weird thing is the following: >>> >>> RS startTime = Mon Apr 01 13:22:35 BRT 2013 >>> >>> [...]$ grep slept /var/log/hbase/hbase-hbase-**** >>> regionserver-PSLBHDN00*.log >>> 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.****Sleeper: >>> We >>> >>> slept 45491ms instead of 3000ms, this is likely due to a long garbage >>> collecting pause and it's usually bad, see >>> http://hbase.apache.org/book.**** >>> html#trouble.rs.runtime.****zkexpired>> org/book.html#trouble.rs.**runtime.zkexpired >>> > >>> 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.****Sleeper: >>> We >>> >>> slept 101271ms instead of 3000ms, this is likely due to a long garbage >>> collecting pause and it's usually bad, see >>> http://hbase.apache.org/book.**** >>> html#trouble.rs.runtime.****zkexpired>> org/book.html#trouble.rs.**runtime.zkexpired >>> > >>> >>> [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[****0-9][0-9]' >>> /var/log/hbase/hbase-hbase-****regionserver-PSLBHDN00*.out >>> >>> * the below report is the above command for each time range. >>> 0.0 - 0.1 secs GCs = 5084 >>> 0.1 - 0.5 secs GCs = 9 >>> 0.5 - 1.0 secs GCs = 3 >>> 1.0 - 010 secs GCs = 0 >>> 010 - 100 secs GCs = 0 >>> 100 - 1000 secs GCs = 0 >>> >>> So, my script for getting every gc time ("real=... secs") says that >>> there is no gc that took longer than 1 second. >>> However the RS log says twice that the RS slept more than 40 seconds >>> instead of 3. >>> >>> "this is likely due to a long garbage collecting pause", yes >>> this is likely but I dont think it is the case. >>> >>> The machine is a huge machine with 70GB RAM, 32 procs, light load, >>> no swap or iowait. >>> >>> Any ideas? >>> >>> Thanks, >>> Pablo >>> >>> >>> On 03/12/2013 12:43 PM, Pablo Musa wrote: >>> >>> Guys, >>>> thank you very much for the help. >>>> >>>> Yesterday I spent 14 hours trying to tune the whole cluster. >>>> The cluster is not ready yet needs a lot of tunning, but at least is >>>> working. >>>> >>>> My first big problem was namenode + datanode GC. They were not using >>>> CMS and thus were taking "incremental" time to run. Ii started in 0.01 >>>> ms and >>>> in 20 minutes was taking 150 secs. >>>> After setting CMSGC this time is much smaller taking a maximum of 70 >>>> secs, >>>> which is VERY HIGH, but for now does not stop HBase. >>>> >>>> With this issue solved, it was clear that the RS was doing a long pause >>>> GC, >>>> taking up to 220 secs. Zookeeper expired the RS and it shutdown. >>>> I tried a lot of different flags configuration (MORE than 20), and could >>>> not >>>> get small gcs. Eventually it would take more than 150 secs (zookeeper >>>> timeout) >>>> and shutdown. >>>> >>>> Finally I tried a config that so far, 12 hours, is working with a >>>> maximum >>>> GC >>>> time of 90 secs. Which of course is a terrible problem since HBase is a >>>> database, but at least the cluster is stable while I can tune it a >>>> little more. >>>> >>>> In my opinion, my biggest problem is to have a few "monster" machines in >>>> the >>>> cluster instead of a bunch of commodities machines. I don't know if >>>> there are >>>> a lot companies using this kind of machines inside a hadoop cluster, but >>>> a fast search on google could not find a lot of tunes for big heap GCs. >>>> >>>> I guess my next step will be search for big heap gc tuning. >>>> >>>> Back to some questions ;) >>>> >>>> > You have ganglia or tsdb running? >>>> >>>> I use zabbix for now, and no there is nothing going on when the big >>>> pause happens. >>>> >>>> > When you see the big pause above, can you see anything going on on >>>> the >>>> > machine? (swap, iowait, concurrent fat mapreduce job?) >>>> > what are you doing during long GC happened? read or write? if >>>> reading, what >>>> > the block cache size? >>>> >>>> The cpu for the RS process goes to 100% and the logs "pause", until it >>>> gets out. >>>> Ex: [NewPar >>>> >>>> IO and SWAP are normal. There is no MR running, just normal database >>>> load, which is >>>> very low. I am probably doing reads AND writes to the database with >>>> default block >>>> cache size. >>>> One problem in this moment might be the big number of regions (1252) >>>> since I am >>>> using only one RS to be able to track the problem. >>>> >>>> The links and ideas were very helpful. Thank you very much guys. >>>> >>>> I will post my future researches as I find a solution ;) >>>> >>>> If you have more ideas or info (links, flag suggestions, etc.), please >>>> post it :) >>>> >>>> Abs, >>>> Pablo >>>> >>>> On 03/10/2013 11:24 PM, Andrew Purtell wrote: >>>> >>>> Be careful with GC tuning, throwing changes at an application without >>>>> analysis of what is going on with the heap is shooting in the dark. One >>>>> particular good treatment of the subject is here: >>>>> http://java.dzone.com/****articles/how-tame-java-gc-****pauses >>>>> >>>>> > >>>>> >>>>> >>>>> If you have made custom changes to blockcache or memstore >>>>> configurations, >>>>> back them out until you're sure everything else is ok. >>>>> >>>>> Watch carefully for swapping. Set the vm.swappiness sysctl to 0. >>>>> Monitor >>>>> for spikes in page scanning or any swap activity. Nothing brings on >>>>> "Juliette" pauses better than a JVM partially swapped out. The Java GC >>>>> starts collection by examining the oldest pages, and those are the >>>>> first >>>>> pages the OS swaps out... >>>>> >>>>> >>>>> >>>>> On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu >>>>> wrote: >>>>> >>>>> Hi Pablo, >>>>> >>>>>> It'a terrible for a long minor GC. I don't think there are swaping >>>>>> from >>>>>> your vmstat log. >>>>>> but I just suggest you >>>>>> 1) add following JVM options: >>>>>> -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 >>>>>> -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 >>>>>> -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods >>>>>> >>>>>> 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g >>>>>> 3) what are you doing during long GC happened? read or write? if >>>>>> reading, >>>>>> what the block cache size? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 11, 2013 at 6:41 AM, Stack wrote: >>>>>> >>>>>> You could increase your zookeeper session timeout to 5 minutes while >>>>>> >>>>>>> you >>>>>>> are figuring why these long pauses. >>>>>>> http://hbase.apache.org/book.****html#zookeeper.session.**timeout >>>>>>> >>>>>>> > >>>>>>> >>>>>>> >>>>>>> Above, there is an outage for almost 5 minutes: >>>>>>> >>>>>>> We slept 225100ms instead of 3000ms, this is likely due to a long >>>>>>> >>>>>>>> You have ganglia or tsdb running? When you see the big pause above, >>>>>>>> >>>>>>> can >>>>>>> you see anything going on on the machine? (swap, iowait, concurrent >>>>>>> fat >>>>>>> mapreduce job?) >>>>>>> >>>>>>> St.Ack >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa wrote: >>>>>>> >>>>>>> Hi Sreepathi, >>>>>>> >>>>>>>> they say in the book (or the site), we could try it to see if it is >>>>>>>> >>>>>>>> really >>>>>>> >>>>>>> a timeout error >>>>>>>> or there is something more. But it is not recomended for production >>>>>>>> environments. >>>>>>>> >>>>>>>> I could give it a try if five minutes will ensure to us that the >>>>>>>> >>>>>>>> problem >>>>>>> is the GC or >>>>>>> >>>>>>>> elsewhere!! Anyway, I think it is hard to beleive a GC is taking >>>>>>>> 2:30 >>>>>>>> minutes. >>>>>>>> >>>>>>>> Abs, >>>>>>>> Pablo >>>>>>>> >>>>>>>> >>>>>>>> On 03/10/2013 04:06 PM, Sreepathi wrote: >>>>>>>> >>>>>>>> Hi Stack/Ted/Pablo, >>>>>>>> >>>>>>>>> Should we increase the hbase.rpc.timeout property to 5 minutes ( >>>>>>>>> >>>>>>>>> 300000 >>>>>>>> >>>>>>> ms >>>>>>> >>>>>>> ) ? >>>>>>>> >>>>>>>>> Regards, >>>>>>>>> - Sreepathi >>>>>>>>> >>>>>>>>> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> That combo should be fine. >>>>>>>>> >>>>>>>>> Great!! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If JVM is full GC'ing, the application is stopped. >>>>>>>>>> >>>>>>>>>> The below does not look like a full GC but that is a long pause >>>>>>>>>>> in >>>>>>>>>>> system >>>>>>>>>>> time, enough to kill your zk session. >>>>>>>>>>> >>>>>>>>>>> Exactly. This pause is really making the zk expire the RS which >>>>>>>>>>> >>>>>>>>>> shutsdown >>>>>>>>> >>>>>>>> (logs >>>>>>>> >>>>>>>>> in the end of the email). >>>>>>>>>> But the question is: what is causing this pause??!! >>>>>>>>>> >>>>>>>>>> You swapping? >>>>>>>>>> I don't think so (stats below). >>>>>>>>>> >>>>>>>>>> Hardware is good? >>>>>>>>>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk >>>>>>>>>> >>>>>>>>>> space. >>>>>>>>> >>>>>>>> Below are some metrics I have heard about. Hope it helps. >>>>>>>> >>>>>>>>> >>>>>>>>>> ** I am having some problems with the datanodes[1] which are >>>>>>>>>> having >>>>>>>>>> trouble to >>>>>>>>>> write. I really think the issues are related, but cannot solve any >>>>>>>>>> of >>>>>>>>>> them >>>>>>>>>> :( >>>>>>>>>> >>>>>>>>>> Thanks again, >>>>>>>>>> Pablo >>>>>>>>>> >>>>>>>>>> [1] http://mail-archives.apache. >>>>>>>>>> >>>>>>>>>> ****org/mod_mbox/hadoop-hdfs-****user/**** >>>>>>>>> >>>>>>>> 201303.mbox/%3CCAJzooYfS-F1KS+************** >>>>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr*** >>>>>> ***** >>>>>> >>>>>> QfVbyQ@mail.gmail.com%3E>>>>>> org/**** < >>>>>>> http://mail-archives.**apache.org/** >>>>>>> > >>>>>>> >>>>>>>> mod_mbox/hadoop-hdfs-user/******201303.mbox/%3CCAJzooYfS-F1KS+** >>>>>>>>>> **** >>>>>>>>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr******QfVbyQ@mail.gmail.com%3E< >>>>>>>>>> >>>>>>>>>> http://mail-archives.apache.****org/mod_mbox/hadoop-hdfs-user/** >>>>>>>>> ** >>>>>>>>> >>>>>>>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr* >>>>>> *** >>>>>> QfVbyQ@mail.gmail.com%3E>>>>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** >>>>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**QfVbyQ@mail.gmail.com%3E >>>>>> > >>>>>> >>>>>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, >>>>>>> >>>>>>>> 2.55, >>>>>>>>> >>>>>>>> 1.28 >>>>>>> >>>>>>>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie >>>>>>>>>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, >>>>>>>>>> 0.0%si, >>>>>>>>>> 0.0%st >>>>>>>>>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k >>>>>>>>>> >>>>>>>>>> buffers >>>>>>>>> >>>>>>>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k >>>>>>>> >>>>>>>>> cached >>>>>>>>> >>>>>>>> ]$ vmstat -w >>>>>>> >>>>>>>> procs -------------------memory-----********------------- ---swap-- >>>>>>>>>> >>>>>>>>>> -----io---- >>>>>>>>>> --system-- -----cpu------- >>>>>>>>>> r b swpd free buff cache si so >>>>>>>>>> bi bo >>>>>>>>>> in >>>>>>>>>> cs us sy id wa st >>>>>>>>>> 2 0 128312 32416928 5838288 5043560 0 0 >>>>>>>>>> 202 53 >>>>>>>>>> 0 >>>>>>>>>> 0 2 1 96 1 0 >>>>>>>>>> >>>>>>>>>> ]$ sar >>>>>>>>>> 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 >>>>>>>>>> 70.29 >>>>>>>>>> 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 >>>>>>>>>> 96.04 >>>>>>>>>> 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 >>>>>>>>>> 87.11 >>>>>>>>>> 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 >>>>>>>>>> 97.40 >>>>>>>>>> 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 >>>>>>>>>> 98.76 >>>>>>>>>> 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 >>>>>>>>>> 98.82 >>>>>>>>>> 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 >>>>>>>>>> 98.71 >>>>>>>>>> 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 >>>>>>>>>> 98.81 >>>>>>>>>> 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 >>>>>>>>>> 94.73 >>>>>>>>>> Average: all 3.86 0.00 1.38 0.88 0.00 >>>>>>>>>> 93.87 >>>>>>>>>> >>>>>>>>>> ]$ iostat >>>>>>>>>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 >>>>>>>>>> _x86_64_ >>>>>>>>>> (16 CPU) >>>>>>>>>> >>>>>>>>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>>>>>>>> 1.86 0.00 0.96 0.78 0.00 96.41 >>>>>>>>>> >>>>>>>>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read >>>>>>>>>> Blk_wrtn >>>>>>>>>> sda 1.23 20.26 23.53 521533196 >>>>>>>>>> 605566924 >>>>>>>>>> sdb 6.51 921.55 241.90 23717850730 >>>>>>>>>> >>>>>>>>>> 6225863488 >>>>>>>>> >>>>>>>> sdc 6.22 921.83 236.41 23725181162 >>>>>>> >>>>>>>> 6084471192 >>>>>>>>> >>>>>>>> sdd 6.25 925.13 237.26 23810004970 >>>>>>> >>>>>>>> 6106357880 >>>>>>>>> >>>>>>>> sde 6.19 913.90 235.60 23521108818 >>>>>>> >>>>>>>> 6063722504 >>>>>>>>> >>>>>>>> sdh 6.26 933.08 237.77 24014594546 >>>>>>> >>>>>>>> 6119511376 >>>>>>>>> >>>>>>>> sdg 6.18 914.36 235.31 23532747378 >>>>>>> >>>>>>>> 6056257016 >>>>>>>>> >>>>>>>> sdf 6.24 923.66 235.33 23772251810 >>>>>>> >>>>>>>> 6056604008 >>>>>>>>> >>>>>>>> Some more logging which reinforce that the RS crash is happening >>>>>>> >>>>>>>> because >>>>>>>>> >>>>>>>> of >>>>>>>> >>>>>>>>> timeout. However this time the GC log is not accusing a big time. >>>>>>>>>> >>>>>>>>>> #####RS LOG##### >>>>>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.******** >>>>>>>>>> ClientCnxn: >>>>>>>>>> >>>>>>>>>> Client >>>>>>>>> >>>>>>>> session timed out, have not heard from server in 257739ms for >>>>>>>> >>>>>>>>> sessionid >>>>>>>>> >>>>>>>> 0x13d3c4bcba6014a, closing socket connection and attempting >>>>>>> reconnect >>>>>>> >>>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.******** >>>>>>>>>> ClientCnxn: >>>>>>>>>> >>>>>>>>>> Client >>>>>>>>> >>>>>>>> session timed out, have not heard from server in 226785ms for >>>>>>>> >>>>>>>>> sessionid >>>>>>>>> >>>>>>>> 0x13d3c4bcba60149, closing socket connection and attempting >>>>>>> reconnect >>>>>>> >>>>>>>> 2013-03-10 15:37:46,712 DEBUG org.apache.hadoop.hbase.io.******** >>>>>>>>>> >>>>>>>>>> hfile.LruBlockCache: >>>>>>>>>> Stats: total=61.91 MB, free=1.94 GB, max=2 GB, blocks=1254, >>>>>>>>>> accesses=60087, >>>>>>>>>> hits=58811, hitRatio=97.87%, , cachingAccesses=60069, >>>>>>>>>> >>>>>>>>>> cachingHits=58811, >>>>>>>>> >>>>>>>> cachingHitsRatio=97.90%, , evictions=0, evicted=0, evictedPerRun=NaN >>>>>>>> >>>>>>>>> 2013-03-10 15:37:46,712 WARN >>>>>>>>>> >>>>>>>>>> org.apache.hadoop.hbase.util.********Sleeper: >>>>>>>>> >>>>>>>> We >>>>>>> >>>>>>>> slept 225100ms instead of 3000ms, this is likely due to a long >>>>>>>>>> >>>>>>>>>> garbage >>>>>>>>> >>>>>>>> collecting pause and it's usually bad, see >>>>>>> >>>>>>>> http://hbase.apache.org/book.******** >>>>>>>>>> >>>>>>>>>> >< >>>>>>>>>> http://hbase.apache.org/book.******< >>>>>>>>>> http://hbase.apache.org/book.**** >>>>>>>>>> > >>>>>>>>>> html#trouble.rs.runtime.********zkexpired>>>>>>>>> **** >>>>>>>>>> org/book.html#trouble.rs.******runtime.zkexpired< >>>>>>>>>> >>>>>>>>>> http://hbase.apache.org/book.****html#trouble.rs.runtime.**** >>>>>>>>> zkexpired >>>>>>>>> >>>>>>>> runtime.zkexpired >>>>>>>>> > >>>>>>>>> >>>>>>>> 2013-03-10 15:37:46,714 WARN org.apache.hadoop.hdfs.******** >>>>>>>> DFSClient: >>>>>>>> >>>>>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>>>>>> BP-43236042-172.17.2.10-********1362490844340:blk_-** >>>>>>>>>> >>>>>>>>>> 6834190810033122569_25150229 >>>>>>>>>> >>>>>>>>>> java.io.EOFException: Premature EOF: no length prefix available >>>>>>>>>> at org.apache.hadoop.hdfs.****** >>>>>>>>>> protocol.HdfsProtoUtil.** >>>>>>>>>> vintPrefixed(HdfsProtoUtil.********java:171) >>>>>>>>>> at org.apache.hadoop.hdfs.******** >>>>>>>>>> protocol.datatransfer.** >>>>>>>>>> PipelineAck.readFields(********PipelineAck.java:114) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> org.apache.hadoop.hdfs.********DFSOutputStream$DataStreamer$*** >>>>>>>>> ***** >>>>>>>>> >>>>>>>> ResponseProcessor.run(********DFSOutputStream.java:670) >>>>>>>> >>>>>>>>> 2013-03-10 15:37:46,716 ERROR org.apache.hadoop.hbase.** >>>>>>>>>> regionserver.HRegionServer: >>>>>>>>>> org.apache.hadoop.hbase.ipc.********** >>>>>>>>>> CallerDisconnectedException: >>>>>>>>>> Aborting >>>>>>>>>> call >>>>>>>>>> get([B@7caf69ed, {"timeRange":[0,********9223372036854775807],"** >>>>>>>>>> totalColumns":1,"cacheBlocks":********true,"families":{"** >>>>>>>>>> details"** >>>>>>>>>> :[**"** >>>>>>>>>> ALL"]},"maxVersions":1,"row":"********\\x00\\x00\\x00\\x00\\** >>>>>>>>>> x00\** >>>>>>>>>> >>>>>>>>>> \*** >>>>>>>>>> *x12\\x93@"}), >>>>>>>>>> rpc version=1, client version=29, methodsFingerPrint=1891768260 >>>>>>>>>> from >>>>>>>>>> 172.17.1.71:51294 after 0 ms, since caller disconnected >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #####GC LOG##### >>>>>>>>>> 2716.635: [GC 2716.635: [ParNew: 57570K->746K(59008K), 0.0046530 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 354857K->300891K(1152704K), 0.0047370 secs] [Times: user=0.03 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.00 secs] >>>>>>> >>>>>>>> 2789.478: [GC 2789.478: [ParNew: 53226K->1192K(59008K), 0.0041370 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 353371K->301337K(1152704K), 0.0042220 secs] [Times: user=0.04 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.00 secs] >>>>>>> >>>>>>>> 2868.435: [GC 2868.435: [ParNew: 53672K->740K(59008K), 0.0041570 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 353817K->300886K(1152704K), 0.0042440 secs] [Times: user=0.03 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 2920.309: [GC 2920.309: [ParNew: 53220K->6202K(59008K), 0.0058440 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 353366K->310443K(1152704K), 0.0059410 secs] [Times: user=0.05 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 2984.239: [GC 2984.239: [ParNew: 58680K->5282K(59008K), 0.0048070 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 362921K->310977K(1152704K), 0.0049180 secs] [Times: user=0.03 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 3050.620: [GC 3050.620: [ParNew: 57524K->5292K(59008K), 0.0055510 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 363219K->313384K(1152704K), 0.0056520 secs] [Times: user=0.04 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 3111.331: [GC 3111.331: [ParNew: 57542K->5637K(59008K), 0.0056900 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 365634K->318079K(1152704K), 0.0057950 secs] [Times: user=0.03 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 3163.510: [GC 3163.510: [ParNew: 58117K->1138K(59008K), 0.0046540 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 370559K->316086K(1152704K), 0.0047470 secs] [Times: user=0.03 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 3217.642: [GC 3217.643: [ParNew: 53618K->1001K(59008K), 0.0038530 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 368566K->315949K(1152704K), 0.0039450 secs] [Times: user=0.03 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.00 secs] >>>>>>> >>>>>>>> 3464.067: [GC 3464.067: [ParNew: 53481K->5348K(59008K), 0.0068370 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 368429K->322343K(1152704K), 0.0069170 secs] [Times: user=0.07 >>>>>>> >>>>>>>> sys=0.01, >>>>>>>>> >>>>>>>> real=0.00 secs] >>>>>>> >>>>>>>> 3465.648: [GC 3465.648: [ParNew: 57828K->1609K(59008K), 0.0111520 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 374823K->321496K(1152704K), 0.0112170 secs] [Times: user=0.07 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> 3466.291: [GC 3466.291: [ParNew: 54089K->2186K(59008K), 0.0093190 >>>>>>>>>> >>>>>>>>>> secs] >>>>>>>>> >>>>>>>> 373976K->322073K(1152704K), 0.0093800 secs] [Times: user=0.06 >>>>>>> >>>>>>>> sys=0.00, >>>>>>>>> >>>>>>>> real=0.01 secs] >>>>>>> >>>>>>>> Heap >>>>>>>>>> par new generation total 59008K, used 15497K >>>>>>>>>> [0x00000005fae00000, >>>>>>>>>> 0x00000005fee00000, 0x00000005fee00000) >>>>>>>>>> eden space 52480K, 25% used [0x00000005fae00000, >>>>>>>>>> >>>>>>>>>> 0x00000005fbaff8c8, >>>>>>>>> >>>>>>>> 0x00000005fe140000) >>>>>>>> >>>>>>>>> from space 6528K, 33% used [0x00000005fe140000, >>>>>>>>>> >>>>>>>>>> 0x00000005fe362b70, >>>>>>>>> >>>>>>>> 0x00000005fe7a0000) >>>>>>> >>>>>>>> to space 6528K, 0% used [0x00000005fe7a0000, >>>>>>>>>> >>>>>>>>>> 0x00000005fe7a0000, >>>>>>>>> >>>>>>>> 0x00000005fee00000) >>>>>>> >>>>>>>> concurrent mark-sweep generation total 1093696K, used 319887K >>>>>>>>>> [0x00000005fee00000, 0x0000000641a10000, 0x00000007fae00000) >>>>>>>>>> concurrent-mark-sweep perm gen total 30464K, used 30427K >>>>>>>>>> [0x00000007fae00000, 0x00000007fcbc0000, 0x0000000800000000) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 03/08/2013 07:02 PM, Stack wrote: >>>>>>>>>> >>>>>>>>>> On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>> 0.94 currently doesn't support hadoop 2.0 >>>>>>> >>>>>>>> Can you deploy hadoop 1.1.1 instead ? >>>>>>>>>>> >>>>>>>>>>>> I am using cdh4.2.0 which uses this version as default >>>>>>>>>>>>> >>>>>>>>>>>>> installation. >>>>>>>>>>>> >>>>>>>>>>> I think it will be a problem for me to deploy 1.1.1 because I >>>>>>>> would >>>>>>>> >>>>>>>>> need >>>>>>>>>>>> to >>>>>>>>>>>> "upgrade" the whole cluster with 70TB of data (backup >>>>>>>>>>>> everything, >>>>>>>>>>>> >>>>>>>>>>>> go >>>>>>>>>>> >>>>>>>>>> offline, etc.). >>>>>>> >>>>>>>> Is there a problem to use cdh4.2.0? >>>>>>>>>>>> I should send my email to cdh list? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> That combo should be fine. >>>>>>>>>>>> >>>>>>>>>>>> You Full GC'ing around this time? >>>>>>>>>>>> >>>>>>>>>>> The GC shows it took a long time. However it does not make any >>>>>>>>>>> sense >>>>>>>>>>> >>>>>>>>>> to be it, since the same ammount of data was cleaned before and >>>>>>> >>>>>>>> AFTER >>>>>>>>>>> >>>>>>>>>> in just 0.01 secs! >>>>>>> >>>>>>>> >>>>>>>>>>>> If JVM is full GC'ing, the application is stopped. >>>>>>>>>>>> >>>>>>>>>>>> [Times: user=0.08 sys=137.62, real=137.62 secs] >>>>>>>>>>>> >>>>>>>>>>> Besides the whole time was used by system. That is what is >>>>>>>>>>> bugging >>>>>>>>>>> me. >>>>>>>>>>> >>>>>>>>>> The below does not look like a full GC but that is a long >>>>>>>> pause >>>>>>>> >>>>>>>>> in >>>>>>>>>>> >>>>>>>>>> system >>>>>>> >>>>>>>> time, enough to kill your zk session. >>>>>>>>>>> >>>>>>>>>>> You swapping? >>>>>>>>>>> >>>>>>>>>>> Hardware is good? >>>>>>>>>>> >>>>>>>>>>> St.Ack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), >>>>>>>>>>> 0.0040990 >>>>>>>>>>> secs] >>>>>>>>>>> >>>>>>>>>> 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03 >>>>>>>> >>>>>>>>> sys=0.00, >>>>>>>>>>> >>>>>>>>>> real=0.01 secs] >>>>>>>> >>>>>>>>> 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000 >>>>>>>>>>>> >>>>>>>>>>>> secs] >>>>>>>>>>> >>>>>>>>>> 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04 >>>>>>>> >>>>>>>>> sys=0.01, >>>>>>>>>>> >>>>>>>>>> real=0.00 secs] >>>>>>>> >>>>>>>>> 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), >>>>>>>>>>>> 137.6353620 >>>>>>>>>>>> secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times: >>>>>>>>>>>> >>>>>>>>>>>> user=0.08 >>>>>>>>>>> >>>>>>>>>> sys=137.62, real=137.62 secs] >>>>>>> >>>>>>>> 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960 >>>>>>>>>>>> >>>>>>>>>>>> secs] >>>>>>>>>>> >>>>>>>>>> 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05 >>>>>>>> >>>>>>>>> sys=0.00, >>>>>>>>>>> >>>>>>>>>> real=0.01 secs] >>>>>>>> >>>>>>>>> 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650 >>>>>>>>>>>> >>>>>>>>>>>> secs] >>>>>>>>>>> >>>>>>>>>> 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06 >>>>>>>> >>>>>>>>> sys=0.00, >>>>>>>>>>> >>>>>>>>>> real=0.01 secs] >>>>>>>> >>>>>>>>> >>>>>>>>>>>> I really appreciate you guys helping me to find out what is >>>>>>>>>>>> wrong. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Pablo >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 03/08/2013 02:11 PM, Stack wrote: >>>>>>>>>>>> >>>>>>>>>>>> What RAM says. >>>>>>>>>>>> >>>>>>>>>>>> 2013-03-07 17:24:57,887 INFO >>>>>>>>>>>> org.apache.zookeeper.************ClientCnxn: >>>>>>>>>>>> >>>>>>>>>>> Client >>>>>>>> >>>>>>>>> session timed out, have not heard from server in 159348ms for >>>>>>>>>>>>> sessionid >>>>>>>>>>>>> 0x13d3c4bcba600a7, closing socket connection and attempting >>>>>>>>>>>>> >>>>>>>>>>>>> reconnect >>>>>>>>>>>> >>>>>>>>>>> You Full GC'ing around this time? >>>>>>>> >>>>>>>>> Put up your configs in a place where we can take a look? >>>>>>>>>>>>> >>>>>>>>>>>>> St.Ack >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan < >>>>>>>>>>>>> ramkrishna.s.vasudevan@gmail.**********com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> **** >>>>>>>> >>>>>>>>> com >>>>>>>>>>>> >>>>>>>>>>>>> ramkrishna.s.vasudevan@gmail.****com>>>>>>>>>>> *gmail.com > >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>> >>>>>>>>> I think it is with your GC config. What is your heap size? >>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>> >>>>>>>>>>> is >>>>>>> >>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>> data that you pump in and how much is the block cache size? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>>> Ram >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>> 0.94 currently doesn't support hadoop 2.0 >>>>>>>> >>>>>>>>> Can you deploy hadoop 1.1.1 instead ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Are you using 0.94.5 ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>> Hey guys, >>>>>>>> >>>>>>>>> as I sent in an email a long time ago, the RSs in my cluster >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> did >>>>>>>>>>>>>> >>>>>>>>>>>>> not >>>>>>> >>>>>>>> get >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> along >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and crashed 3 times a day. I tried a lot of options we >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> discussed >>>>>>>>>>>>>> >>>>>>>>>>>>> in >>>>>>> >>>>>>> the >>>>>>>> >>>>>>>>> emails, but it not solved the problem. As I used an old >>>>>>>>>>>>>>>> version >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> >>>>>>>>>>>>>> hadoop I >>>>>>>> >>>>>>>>> thought this was the problem. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper >>>>>>>>>>>>>>> 3.3.5 >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> >>>>>>>>>>>>>> hadoop >>>>>>> >>>>>>>> 2.0.0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - hbase 0.94 - zookeeper 3.4.5. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Unfortunately the RSs did not stop crashing, and worst! Now >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> they >>>>>>>>>>>>>>> >>>>>>>>>>>>>> crash >>>>>>> >>>>>>>> every >>>>>>>>>>>>>>>> hour and some times when the RS that holds the .ROOT. >>>>>>>>>>>>>>>> crashes >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>> >>>>>>>>>>>>>> cluster >>>>>>> >>>>>>>> get >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> stuck in transition and everything stops working. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In this case I need to clean zookeeper znodes, restart the >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> master >>>>>>>>>>>>>>> >>>>>>>>>>>>>> and >>>>>>> >>>>>>>> the >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> RSs. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To avoid this case I am running on production with only >>>>>>>>>>>>>>> ONE >>>>>>>>>>>>>>> RS >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and >>>>>>>>>>>>>> >>>>>>>>>>>>> a >>>>>>>> >>>>>>>>> monitoring >>>>>>>>>>>>>>>> script that check every minute, if the RS is ok. If not, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> restart >>>>>>>>>>>>>>> >>>>>>>>>>>>>> it. >>>>>>> >>>>>>>> * This case does not get the cluster stuck. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is driving me crazy, but I really cant find a solution >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> >>>>>>>>>>>>>> cluster. >>>>>>>> >>>>>>>>> I tracked all logs from the start time 16:49 from all >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> interesting >>>>>>>>>>>>>>> >>>>>>>>>>>>>> nodes >>>>>>> >>>>>>>> (zoo, >>>>>>>>>>>>>>>> namenode, master, rs, dn2, dn9, dn10) and copied here what I >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>> >>>>>>>>>>>>>> is >>>>>>>> >>>>>>>>> usefull. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There are some strange errors in the DATANODE2, as an error >>>>>>>>>>>>>>>> copiyng a >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to itself. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The gc log points to GC timeout. However it is very weird >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> >>>>>>>>>>>>>> RS >>>>>>>> >>>>>>>>> spend >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> so much time in GC while in the other cases it takes >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 0.001sec. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Besides, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> the time >>>>>>>>>>>>>>>> spent, is in sys which makes me think that might be a >>>>>>>>>>>>>>>> problem >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> >>>>>>>>>>>>>> another >>>>>>> >>>>>>>> place. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I know that it is a bunch of logs, and that it is very >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> difficult >>>>>>>>>>>>>> >>>>>>>>>>>>> to >>>>>>> >>>>>>> find >>>>>>>> >>>>>>>>> the >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> problem without much context. But I REALLY need some >>>>>>>>>>>>>>> help. >>>>>>>>>>>>>>> If >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> it >>>>>>>>>>>>>> >>>>>>>>>>>>> is >>>>>>> >>>>>>> not >>>>>>>> >>>>>>>>> the >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> solution, at least what I should read, where I should >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> look, or >>>>>>>>>>>>>>> which >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> cases >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> should monitor. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you very much, >>>>>>>>>>>>>>>> Pablo Musa >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Regards,* >>>>>>>>> --- *Sreepathi * >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>> Best regards, >>>>> >>>>> - Andy >>>>> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet >>>>> Hein >>>>> (via Tom White) >>>>> >>>>> > --14dae94734b346bd6a04d97bb51d--