Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of pablo@psafe.com designates
 187.0.212.22 as permitted sender)
Message-ID: <515C7336.9060505@psafe.com>
Date: Wed, 3 Apr 2013 15:21:42 -0300
From: Pablo Musa <pablo@psafe.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130308 Thunderbird/17.0.4
MIME-Version: 1.0
To: "user@hbase.apache.org" <user@hbase.apache.org>
Subject: Re: RegionServers Crashing every hour in production env
References: <513A0758.3090507@psafe.com>
 <CALte62wk7h2Mg8hj2-pc-+6gnvXJx-A6o6S4EP3MVdjOgozqgg@mail.gmail.com>
 <CAAT7MkoTwPJt4=YerAMaCCTok9QZXTuHXyWKisMZDuNO-cydKg@mail.gmail.com>
 <CADcMMgER-mtEW9V00PaApneY-M+yZ6V-Fa5SNXr+CCTYCFqi7A@mail.gmail.com>
 <513A34D0.80507@psafe.com>
 <CADcMMgFrsdVCBwoXdKY-i21RVxwJpVeJpP_-hJciOwYJwDF5mA@mail.gmail.com>
 <513CD815.9070005@psafe.com>
 <CAPt-PJtrOL4a8vNaaPZX5vUE-LC1rbWxKt504d7bcdO4_nkk=A@mail.gmail.com>
 <513D092F.1020106@psafe.com>
 <CADcMMgEx=r793Gf3Sj9iw1ymbNkY7DKjdG=SYfCd5mtR1u+Wfg@mail.gmail.com>
 <CALr1C9pPbECYRON79Pk65j439RZK5GUWT1GBcfXhA=SuZitA1A@mail.gmail.com>
 <CA+RK=_Dq7kDG00JGW0tQ6jhjQ2pazX1WA3FCaaSk-57ZgrmzXg@mail.gmail.com>
 <513F4D08.3040403@psafe.com>
In-Reply-To: <513F4D08.3040403@psafe.com>
Content-Type: multipart/alternative;
	boundary="------------050908020206050708080309"

--------------050908020206050708080309
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Hello guys,
I stopped my research on HBase ZK timeout for while due to
other issues I had to do, but I am back.

A very weird behavior that I would like your comments is that my
RegionServers perform better (less crashes) under heavy load instead
of light load.
There is, if I let HBase alone with 50 requestsPerSecond along the
whole day the crashes are higher than if I run a mapred Job every hour.


Another weird thing is the following:

RS startTime = Mon Apr 01 13:22:35 BRT 2013

[...]$ grep slept /var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.log
2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.Sleeper: We 
slept 45491ms instead of 3000ms, this is likely due to a long garbage 
collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.Sleeper: We 
slept 101271ms instead of 3000ms, this is likely due to a long garbage 
collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

[...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[0-9][0-9]' 
/var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.out
* the below report is the above command for each time range.
0.0 - 0.1  secs GCs = 5084
0.1 - 0.5  secs GCs = 9
0.5 - 1.0  secs GCs = 3
1.0 - 010  secs GCs = 0
010 - 100  secs GCs = 0
100 - 1000 secs GCs = 0

So, my script for getting every gc time ("real=... secs") says that
there is no gc that took longer than 1 second.
However the RS log says twice that the RS slept more than 40 seconds
instead of 3.

"this is likely due to a long garbage collecting pause", yes
this is likely but I dont think it is the case.

The machine is a huge machine with 70GB RAM, 32 procs, light load,
no swap or iowait.

Any ideas?

Thanks,
Pablo

On 03/12/2013 12:43 PM, Pablo Musa wrote:
> Guys,
> thank you very much for the help.
>
> Yesterday I spent 14 hours trying to tune the whole cluster.
> The cluster is not ready yet needs a lot of tunning, but at least is
> working.
>
> My first big problem was namenode + datanode GC. They were not using
> CMS and thus were taking "incremental" time to run. Ii started in 0.01
> ms and
> in 20 minutes was taking 150 secs.
> After setting CMSGC this time is much smaller taking a maximum of 70 secs,
> which is VERY HIGH, but for now does not stop HBase.
>
> With this issue solved, it was clear that the RS was doing a long pause GC,
> taking up to 220 secs. Zookeeper expired the RS and it shutdown.
> I tried a lot of different flags configuration (MORE than 20), and could not
> get small gcs. Eventually it would take more than 150 secs (zookeeper
> timeout)
> and shutdown.
>
> Finally I tried a config that so far, 12 hours, is working with a maximum GC
> time of 90 secs. Which of course is a terrible problem since HBase is a
> database, but at least the cluster is stable while I can tune it a
> little more.
>
> In my opinion, my biggest problem is to have a few "monster" machines in the
> cluster instead of a bunch of commodities machines. I don't know if
> there are
> a lot companies using this kind of machines inside a hadoop cluster, but
> a fast search on google could not find a lot of tunes for big heap GCs.
>
> I guess my next step will be search for big heap gc tuning.
>
> Back to some questions ;)
>
>   > You have ganglia or tsdb running?
>
> I use zabbix for now, and no there is nothing going on when the big
> pause happens.
>
>   > When you see the big pause above, can you see anything going on on the
>   > machine? (swap, iowait, concurrent fat mapreduce job?)
>   > what are you doing during long GC happened? read or write? if
> reading, what
>   > the block cache size?
>
> The cpu for the RS process goes to 100% and the logs "pause", until it
> gets out.
> Ex: [NewPar
>
> IO and SWAP are normal. There is no MR running, just normal database
> load, which is
> very low. I am probably doing reads AND writes to the database with
> default block
> cache size.
> One problem in this moment might be the big number of regions (1252)
> since I am
> using only one RS to be able to track the problem.
>
> The links and ideas were very helpful. Thank you very much guys.
>
> I will post my future researches as I find a solution ;)
>
> If you have more ideas or info (links, flag suggestions, etc.), please
> post it :)
>
> Abs,
> Pablo
>
> On 03/10/2013 11:24 PM, Andrew Purtell wrote:
>> Be careful with GC tuning, throwing changes at an application without
>> analysis of what is going on with the heap is shooting in the dark. One
>> particular good treatment of the subject is here:
>> http://java.dzone.com/articles/how-tame-java-gc-pauses
>>
>> If you have made custom changes to blockcache or memstore configurations,
>> back them out until you're sure everything else is ok.
>>
>> Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor
>> for spikes in page scanning or any swap activity. Nothing brings on
>> "Juliette" pauses better than a JVM partially swapped out. The Java GC
>> starts collection by examining the oldest pages, and those are the first
>> pages the OS swaps out...
>>
>>
>>
>> On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>>
>>> Hi Pablo,
>>> It'a terrible for a long minor GC. I don't think there are swaping from
>>> your vmstat log.
>>> but I just suggest you
>>> 1) add following JVM options:
>>> -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19
>>> -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2
>>> -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods
>>>
>>> 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g
>>> 3) what are you doing during long GC happened? read or write? if reading,
>>> what the block cache size?
>>>
>>>
>>>
>>>
>>> On Mon, Mar 11, 2013 at 6:41 AM, Stack <stack@duboce.net> wrote:
>>>
>>>> You could increase your zookeeper session timeout to 5 minutes while you
>>>> are figuring why these long pauses.
>>>> http://hbase.apache.org/book.html#zookeeper.session.timeout
>>>>
>>>> Above, there is an outage for almost 5 minutes:
>>>>
>>>>>> We slept 225100ms instead of 3000ms, this is likely due to a long
>>>> You have ganglia or tsdb running?  When you see the big pause above, can
>>>> you see anything going on on the machine?  (swap, iowait, concurrent fat
>>>> mapreduce job?)
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>> On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <pablo@psafe.com> wrote:
>>>>
>>>>> Hi Sreepathi,
>>>>> they say in the book (or the site), we could try it to see if it is
>>>> really
>>>>> a timeout error
>>>>> or there is something more. But it is not recomended for production
>>>>> environments.
>>>>>
>>>>> I could give it a try if five minutes will ensure to us that the
>>> problem
>>>>> is the GC or
>>>>> elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30
>>>>> minutes.
>>>>>
>>>>> Abs,
>>>>> Pablo
>>>>>
>>>>>
>>>>> On 03/10/2013 04:06 PM, Sreepathi wrote:
>>>>>
>>>>>> Hi Stack/Ted/Pablo,
>>>>>>
>>>>>> Should we increase the hbase.rpc.timeout property to 5 minutes (
>>> 300000
>>>> ms
>>>>>> )  ?
>>>>>>
>>>>>> Regards,
>>>>>> - Sreepathi
>>>>>>
>>>>>> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <pablo@psafe.com> wrote:
>>>>>>
>>>>>>    That combo should be fine.
>>>>>>> Great!!
>>>>>>>
>>>>>>>
>>>>>>>    If JVM is full GC'ing, the application is stopped.
>>>>>>>> The below does not look like a full GC but that is a long pause in
>>>>>>>> system
>>>>>>>> time, enough to kill your zk session.
>>>>>>>>
>>>>>>> Exactly. This pause is really making the zk expire the RS which
>>>> shutsdown
>>>>>>> (logs
>>>>>>> in the end of the email).
>>>>>>> But the question is: what is causing this pause??!!
>>>>>>>
>>>>>>>    You swapping?
>>>>>>> I don't think so (stats below).
>>>>>>>
>>>>>>>    Hardware is good?
>>>>>>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk
>>>> space.
>>>>>>> Below are some metrics I have heard about. Hope it helps.
>>>>>>>
>>>>>>>
>>>>>>> ** I am having some problems with the datanodes[1] which are having
>>>>>>> trouble to
>>>>>>> write. I really think the issues are related, but cannot solve any of
>>>>>>> them
>>>>>>> :(
>>>>>>>
>>>>>>> Thanks again,
>>>>>>> Pablo
>>>>>>>
>>>>>>> [1] http://mail-archives.apache.
>>> ****org/mod_mbox/hadoop-hdfs-user/****
>>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr****
>>>>>>> QfVbyQ@mail.gmail.com%3E<http:**//mail-archives.apache.org/**
>>>>>>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+**
>>>>>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**QfVbyQ@mail.gmail.com%3E<
>>> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+jGOPUt15PwFjcCSzigE0APeM9FXaCrQfVbyQ@mail.gmail.com%3E
>>>>>>> top - 15:38:04 up 297 days, 21:03,  2 users,  load average: 4.34,
>>> 2.55,
>>>>>>> 1.28
>>>>>>> Tasks: 528 total,   1 running, 527 sleeping,   0 stopped,   0 zombie
>>>>>>> Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi, 0.0%si,
>>>>>>>     0.0%st
>>>>>>> Mem:  74187256k total, 29493992k used, 44693264k free,  5836576k
>>>> buffers
>>>>>>> Swap: 51609592k total,   128312k used, 51481280k free,  1353400k
>>> cached
>>>>>>> ]$ vmstat -w
>>>>>>> procs -------------------memory-----****------------- ---swap--
>>>>>>> -----io----
>>>>>>> --system-- -----cpu-------
>>>>>>>     r  b       swpd       free       buff      cache   si   so    bi bo
>>>>>>> in
>>>>>>>      cs  us sy  id wa st
>>>>>>>     2  0     128312   32416928    5838288    5043560    0    0   202 53
>>>>>>>    0
>>>>>>>       0   2  1  96  1  0
>>>>>>>
>>>>>>> ]$ sar
>>>>>>> 02:20:01 PM     all     26.18      0.00      2.90      0.63 0.00
>>>>>>> 70.29
>>>>>>> 02:30:01 PM     all      1.66      0.00      1.25      1.05 0.00
>>>>>>> 96.04
>>>>>>> 02:40:01 PM     all     10.01      0.00      2.14      0.75 0.00
>>>>>>> 87.11
>>>>>>> 02:50:01 PM     all      0.76      0.00      0.80      1.03 0.00
>>>>>>> 97.40
>>>>>>> 03:00:01 PM     all      0.23      0.00      0.30      0.71 0.00
>>>>>>> 98.76
>>>>>>> 03:10:01 PM     all      0.22      0.00      0.30      0.66 0.00
>>>>>>> 98.82
>>>>>>> 03:20:01 PM     all      0.22      0.00      0.31      0.76 0.00
>>>>>>> 98.71
>>>>>>> 03:30:01 PM     all      0.24      0.00      0.31      0.64 0.00
>>>>>>> 98.81
>>>>>>> 03:40:01 PM     all      1.13      0.00      2.97      1.18 0.00
>>>>>>> 94.73
>>>>>>> Average:        all      3.86      0.00      1.38      0.88 0.00
>>>>>>> 93.87
>>>>>>>
>>>>>>> ]$ iostat
>>>>>>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002)     03/10/2013 _x86_64_
>>>>>>>     (16 CPU)
>>>>>>>
>>>>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>>>>               1.86    0.00    0.96    0.78    0.00   96.41
>>>>>>>
>>>>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read Blk_wrtn
>>>>>>> sda               1.23        20.26        23.53  521533196 605566924
>>>>>>> sdb               6.51       921.55       241.90 23717850730
>>> 6225863488
>>>>>>> sdc               6.22       921.83       236.41 23725181162
>>> 6084471192
>>>>>>> sdd               6.25       925.13       237.26 23810004970
>>> 6106357880
>>>>>>> sde               6.19       913.90       235.60 23521108818
>>> 6063722504
>>>>>>> sdh               6.26       933.08       237.77 24014594546
>>> 6119511376
>>>>>>> sdg               6.18       914.36       235.31 23532747378
>>> 6056257016
>>>>>>> sdf               6.24       923.66       235.33 23772251810
>>> 6056604008
>>>>>>> Some more logging which reinforce that the RS crash is happening
>>>> because
>>>>>>> of
>>>>>>> timeout. However this time the GC log is not accusing a big time.
>>>>>>>
>>>>>>> #####RS LOG#####
>>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.****ClientCnxn:
>>>> Client
>>>>>>> session timed out, have not heard from server in 257739ms for
>>> sessionid
>>>>>>> 0x13d3c4bcba6014a, closing socket connection and attempting reconnect
>>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.****ClientCnxn:
>>>> Client
>>>>>>> session timed out, have not heard from server in 226785ms for
>>> sessionid
>>>>>>> 0x13d3c4bcba60149, closing socket connection and attempting reconnect
>>>>>>> 2013-03-10 15:37:46,712 DEBUG org.apache.hadoop.hbase.io.****
>>>>>>> hfile.LruBlockCache:
>>>>>>> Stats: total=61.91 MB, free=1.94 GB, max=2 GB, blocks=1254,
>>>>>>> accesses=60087,
>>>>>>> hits=58811, hitRatio=97.87%, , cachingAccesses=60069,
>>>> cachingHits=58811,
>>>>>>> cachingHitsRatio=97.90%, , evictions=0, evicted=0, evictedPerRun=NaN
>>>>>>> 2013-03-10 15:37:46,712 WARN
>>> org.apache.hadoop.hbase.util.****Sleeper:
>>>>>>> We
>>>>>>> slept 225100ms instead of 3000ms, this is likely due to a long
>>> garbage
>>>>>>> collecting pause and it's usually bad, see
>>>>>>> http://hbase.apache.org/book.**** <http://hbase.apache.org/book.**>
>>>>>>> html#trouble.rs.runtime.****zkexpired<http://hbase.apache.**
>>>>>>> org/book.html#trouble.rs.**runtime.zkexpired<
>>>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired>
>>>>>>> 2013-03-10 15:37:46,714 WARN org.apache.hadoop.hdfs.****DFSClient:
>>>>>>> DFSOutputStream ResponseProcessor exception  for block
>>>>>>> BP-43236042-172.17.2.10-****1362490844340:blk_-**
>>>>>>> 6834190810033122569_25150229
>>>>>>>
>>>>>>> java.io.EOFException: Premature EOF: no length prefix available
>>>>>>>            at org.apache.hadoop.hdfs.****protocol.HdfsProtoUtil.**
>>>>>>> vintPrefixed(HdfsProtoUtil.****java:171)
>>>>>>>            at org.apache.hadoop.hdfs.****protocol.datatransfer.**
>>>>>>> PipelineAck.readFields(****PipelineAck.java:114)
>>>>>>>            at
>>>> org.apache.hadoop.hdfs.****DFSOutputStream$DataStreamer$****
>>>>>>> ResponseProcessor.run(****DFSOutputStream.java:670)
>>>>>>> 2013-03-10 15:37:46,716 ERROR org.apache.hadoop.hbase.**
>>>>>>> regionserver.HRegionServer:
>>>>>>> org.apache.hadoop.hbase.ipc.****CallerDisconnectedException: Aborting
>>>>>>> call
>>>>>>> get([B@7caf69ed, {"timeRange":[0,****9223372036854775807],"**
>>>>>>> totalColumns":1,"cacheBlocks":****true,"families":{"details":[**"**
>>>>>>> ALL"]},"maxVersions":1,"row":"****\\x00\\x00\\x00\\x00\\x00\\***
>>>>>>> *x12\\x93@"}),
>>>>>>> rpc version=1, client version=29, methodsFingerPrint=1891768260 from
>>>>>>> 172.17.1.71:51294 after 0 ms, since caller disconnected
>>>>>>>
>>>>>>>
>>>>>>> #####GC LOG#####
>>>>>>> 2716.635: [GC 2716.635: [ParNew: 57570K->746K(59008K), 0.0046530
>>> secs]
>>>>>>> 354857K->300891K(1152704K), 0.0047370 secs] [Times: user=0.03
>>> sys=0.00,
>>>>>>> real=0.00 secs]
>>>>>>> 2789.478: [GC 2789.478: [ParNew: 53226K->1192K(59008K), 0.0041370
>>> secs]
>>>>>>> 353371K->301337K(1152704K), 0.0042220 secs] [Times: user=0.04
>>> sys=0.00,
>>>>>>> real=0.00 secs]
>>>>>>> 2868.435: [GC 2868.435: [ParNew: 53672K->740K(59008K), 0.0041570
>>> secs]
>>>>>>> 353817K->300886K(1152704K), 0.0042440 secs] [Times: user=0.03
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 2920.309: [GC 2920.309: [ParNew: 53220K->6202K(59008K), 0.0058440
>>> secs]
>>>>>>> 353366K->310443K(1152704K), 0.0059410 secs] [Times: user=0.05
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 2984.239: [GC 2984.239: [ParNew: 58680K->5282K(59008K), 0.0048070
>>> secs]
>>>>>>> 362921K->310977K(1152704K), 0.0049180 secs] [Times: user=0.03
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 3050.620: [GC 3050.620: [ParNew: 57524K->5292K(59008K), 0.0055510
>>> secs]
>>>>>>> 363219K->313384K(1152704K), 0.0056520 secs] [Times: user=0.04
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 3111.331: [GC 3111.331: [ParNew: 57542K->5637K(59008K), 0.0056900
>>> secs]
>>>>>>> 365634K->318079K(1152704K), 0.0057950 secs] [Times: user=0.03
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 3163.510: [GC 3163.510: [ParNew: 58117K->1138K(59008K), 0.0046540
>>> secs]
>>>>>>> 370559K->316086K(1152704K), 0.0047470 secs] [Times: user=0.03
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 3217.642: [GC 3217.643: [ParNew: 53618K->1001K(59008K), 0.0038530
>>> secs]
>>>>>>> 368566K->315949K(1152704K), 0.0039450 secs] [Times: user=0.03
>>> sys=0.00,
>>>>>>> real=0.00 secs]
>>>>>>> 3464.067: [GC 3464.067: [ParNew: 53481K->5348K(59008K), 0.0068370
>>> secs]
>>>>>>> 368429K->322343K(1152704K), 0.0069170 secs] [Times: user=0.07
>>> sys=0.01,
>>>>>>> real=0.00 secs]
>>>>>>> 3465.648: [GC 3465.648: [ParNew: 57828K->1609K(59008K), 0.0111520
>>> secs]
>>>>>>> 374823K->321496K(1152704K), 0.0112170 secs] [Times: user=0.07
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> 3466.291: [GC 3466.291: [ParNew: 54089K->2186K(59008K), 0.0093190
>>> secs]
>>>>>>> 373976K->322073K(1152704K), 0.0093800 secs] [Times: user=0.06
>>> sys=0.00,
>>>>>>> real=0.01 secs]
>>>>>>> Heap
>>>>>>>     par new generation   total 59008K, used 15497K [0x00000005fae00000,
>>>>>>> 0x00000005fee00000, 0x00000005fee00000)
>>>>>>>      eden space 52480K,  25% used [0x00000005fae00000,
>>>> 0x00000005fbaff8c8,
>>>>>>> 0x00000005fe140000)
>>>>>>>      from space 6528K,  33% used [0x00000005fe140000,
>>> 0x00000005fe362b70,
>>>>>>> 0x00000005fe7a0000)
>>>>>>>      to   space 6528K,   0% used [0x00000005fe7a0000,
>>> 0x00000005fe7a0000,
>>>>>>> 0x00000005fee00000)
>>>>>>>     concurrent mark-sweep generation total 1093696K, used 319887K
>>>>>>> [0x00000005fee00000, 0x0000000641a10000, 0x00000007fae00000)
>>>>>>>     concurrent-mark-sweep perm gen total 30464K, used 30427K
>>>>>>> [0x00000007fae00000, 0x00000007fcbc0000, 0x0000000800000000)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 03/08/2013 07:02 PM, Stack wrote:
>>>>>>>
>>>>>>>    On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa <pablo@psafe.com>
>>> wrote:
>>>>>>>>     0.94 currently doesn't support hadoop 2.0
>>>>>>>>
>>>>>>>>> Can you deploy hadoop 1.1.1 instead ?
>>>>>>>>>>     I am using cdh4.2.0 which uses this version as default
>>>> installation.
>>>>>>>>> I think it will be a problem for me to deploy 1.1.1 because I would
>>>>>>>>> need
>>>>>>>>> to
>>>>>>>>> "upgrade" the whole cluster with 70TB of data (backup everything,
>>> go
>>>>>>>>> offline, etc.).
>>>>>>>>>
>>>>>>>>> Is there a problem to use cdh4.2.0?
>>>>>>>>> I should send my email to cdh list?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     That combo should be fine.
>>>>>>>>>
>>>>>>>>       You Full GC'ing around this time?
>>>>>>>>
>>>>>>>>> The GC shows it took a long time. However it does not make any
>>> sense
>>>>>>>>> to be it, since the same ammount of data was cleaned before and
>>> AFTER
>>>>>>>>> in just 0.01 secs!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     If JVM is full GC'ing, the application is stopped.
>>>>>>>>>
>>>>>>>>     [Times: user=0.08 sys=137.62, real=137.62 secs]
>>>>>>>>
>>>>>>>>> Besides the whole time was used by system. That is what is bugging
>>>> me.
>>>>>>>>>     The below does not look like a full GC but that is a long pause
>>> in
>>>>>>>> system
>>>>>>>> time, enough to kill your zk session.
>>>>>>>>
>>>>>>>> You swapping?
>>>>>>>>
>>>>>>>> Hardware is good?
>>>>>>>>
>>>>>>>> St.Ack
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>       ...
>>>>>>>>
>>>>>>>>> 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), 0.0040990
>>>> secs]
>>>>>>>>> 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03
>>>> sys=0.00,
>>>>>>>>> real=0.01 secs]
>>>>>>>>>
>>>>>>>>> 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000
>>>> secs]
>>>>>>>>> 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04
>>>> sys=0.01,
>>>>>>>>> real=0.00 secs]
>>>>>>>>>
>>>>>>>>> 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), 137.6353620
>>>>>>>>> secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times:
>>> user=0.08
>>>>>>>>> sys=137.62, real=137.62 secs]
>>>>>>>>>
>>>>>>>>> 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960
>>>> secs]
>>>>>>>>> 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05
>>>> sys=0.00,
>>>>>>>>> real=0.01 secs]
>>>>>>>>>
>>>>>>>>> 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650
>>>> secs]
>>>>>>>>> 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06
>>>> sys=0.00,
>>>>>>>>> real=0.01 secs]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I really appreciate you guys helping me to find out what is wrong.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Pablo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/08/2013 02:11 PM, Stack wrote:
>>>>>>>>>
>>>>>>>>>     What RAM says.
>>>>>>>>>
>>>>>>>>>> 2013-03-07 17:24:57,887 INFO
>>>> org.apache.zookeeper.********ClientCnxn:
>>>>>>>>>> Client
>>>>>>>>>>
>>>>>>>>>> session timed out, have not heard from server in 159348ms for
>>>>>>>>>> sessionid
>>>>>>>>>> 0x13d3c4bcba600a7, closing socket connection and attempting
>>>> reconnect
>>>>>>>>>> You Full GC'ing around this time?
>>>>>>>>>>
>>>>>>>>>> Put up your configs in a place where we can take a look?
>>>>>>>>>>
>>>>>>>>>> St.Ack
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan <
>>>>>>>>>> ramkrishna.s.vasudevan@gmail.******com
>>>> <ramkrishna.s.vasudevan@gmail.
>>>>>>>>>> ****
>>>>>>>>>> com <ramkrishna.s.vasudevan@gmail.**com<
>>>> ramkrishna.s.vasudevan@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>      I think it is with your GC config.  What is your heap size?
>>>    What
>>>>>>>>>> is
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>    data that you pump in and how much is the block cache size?
>>>>>>>>>>> Regards
>>>>>>>>>>> Ram
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu <yuzhihong@gmail.com>
>>>> wrote:
>>>>>>>>>>>      0.94 currently doesn't support hadoop 2.0
>>>>>>>>>>>
>>>>>>>>>>>    Can you deploy hadoop 1.1.1 instead ?
>>>>>>>>>>>> Are you using 0.94.5 ?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <pablo@psafe.com>
>>>> wrote:
>>>>>>>>>>>>      Hey guys,
>>>>>>>>>>>>
>>>>>>>>>>>>    as I sent in an email a long time ago, the RSs in my cluster
>>> did
>>>>>>>>>>>>> not
>>>>>>>>>>>>>
>>>>>>>>>>>>>     get
>>>>>>>>>>>>>
>>>>>>>>>>>> along
>>>>>>>>>>>>
>>>>>>>>>>>>    and crashed 3 times a day. I tried a lot of options we
>>> discussed
>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>> emails, but it not solved the problem. As I used an old version
>>>> of
>>>>>>>>>>>>>     hadoop I
>>>>>>>>>>>>>
>>>>>>>>>>>>     thought this was the problem.
>>>>>>>>>>>>
>>>>>>>>>>>>> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5
>>> to
>>>>>>>>>>>>>     hadoop
>>>>>>>>>>>>>
>>>>>>>>>>>> 2.0.0
>>>>>>>>>>>>
>>>>>>>>>>>>    - hbase 0.94 - zookeeper 3.4.5.
>>>>>>>>>>>>> Unfortunately the RSs did not stop crashing, and worst! Now
>>> they
>>>>>>>>>>>>> crash
>>>>>>>>>>>>> every
>>>>>>>>>>>>> hour and some times when the RS that holds the .ROOT. crashes
>>> all
>>>>>>>>>>>>>     cluster
>>>>>>>>>>>>>
>>>>>>>>>>>> get
>>>>>>>>>>>>
>>>>>>>>>>>>    stuck in transition and everything stops working.
>>>>>>>>>>>>> In this case I need to clean zookeeper znodes, restart the
>>> master
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>>     the
>>>>>>>>>>>>>
>>>>>>>>>>>> RSs.
>>>>>>>>>>>>
>>>>>>>>>>>>    To avoid this case I am running on production with only ONE RS
>>>> and
>>>>>>>>>>>>> a
>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>> script that check every minute, if the RS is ok. If not,
>>> restart
>>>>>>>>>>>>> it.
>>>>>>>>>>>>> * This case does not get the cluster stuck.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is driving me crazy, but I really cant find a solution for
>>>> the
>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>> I tracked all logs from the start time 16:49 from all
>>> interesting
>>>>>>>>>>>>> nodes
>>>>>>>>>>>>> (zoo,
>>>>>>>>>>>>> namenode, master, rs, dn2, dn9, dn10) and copied here what I
>>>> think
>>>>>>>>>>>>> is
>>>>>>>>>>>>> usefull.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are some strange errors in the DATANODE2, as an error
>>>>>>>>>>>>> copiyng a
>>>>>>>>>>>>>
>>>>>>>>>>>>>     block
>>>>>>>>>>>>>
>>>>>>>>>>>>     to itself.
>>>>>>>>>>>>
>>>>>>>>>>>>> The gc log points to GC timeout. However it is very weird that
>>>> the
>>>>>>>>>>>>> RS
>>>>>>>>>>>>>
>>>>>>>>>>>>>     spend
>>>>>>>>>>>>>
>>>>>>>>>>>>     so much time in GC while in the other cases it takes 0.001sec.
>>>>>>>>>>>>
>>>>>>>>>>>>> Besides,
>>>>>>>>>>>>> the time
>>>>>>>>>>>>> spent, is in sys which makes me think that might be a problem
>>> in
>>>>>>>>>>>>>     another
>>>>>>>>>>>>>
>>>>>>>>>>>> place.
>>>>>>>>>>>>
>>>>>>>>>>>>    I know that it is a bunch of logs, and that it is very
>>> difficult
>>>> to
>>>>>>>>>>>>>     find
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>>    problem without much context. But I REALLY need some help. If
>>> it
>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>>
>>>>>>>>>>>>>     the
>>>>>>>>>>>>>
>>>>>>>>>>>>     solution, at least what I should read, where I should look, or
>>>>>>>>>>>> which
>>>>>>>>>>>>
>>>>>>>>>>>>>     cases
>>>>>>>>>>>>>
>>>>>>>>>>>>     I
>>>>>>>>>>>>
>>>>>>>>>>>>> should monitor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you very much,
>>>>>>>>>>>>> Pablo Musa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>> --
>>>>>> *Regards,*
>>>>>> --- *Sreepathi *
>>>>>>
>>
>> --
>> Best regards,
>>
>>      - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)


--------------050908020206050708080309--