Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 54763 invoked from network); 20 Jan 2010 19:34:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2010 19:34:44 -0000 Received: (qmail 5770 invoked by uid 500); 20 Jan 2010 19:34:43 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 5702 invoked by uid 500); 20 Jan 2010 19:34:42 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 5692 invoked by uid 99); 20 Jan 2010 19:34:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 19:34:42 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=MIME_QP_LONG_LINE,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [196.30.82.197] (HELO relay01.entelligence.co.za) (196.30.82.197) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 20 Jan 2010 19:34:33 +0000 Received: from relay01.entelligence.co.za ([192.168.0.41]) by relay01.entelligence.co.za ; Wed, 20 Jan 2010 21:30:40 +0200 Received: from ([127.0.0.1]) with MailEnable ESMTP; Wed, 20 Jan 2010 21:30:20 +0200 User-Agent: Microsoft-Entourage/13.0.0.090609 Date: Wed, 20 Jan 2010 21:33:26 +0200 Subject: Re: Hbase pausing problems From: Seraph Imalia To: Message-ID: Thread-Topic: Hbase pausing problems Thread-Index: AcqaB2+GIqtg86vJCkqUzAms2Egkvg== In-Reply-To: <7c962aed1001201055p2702949tb987075cf48dfde1@mail.gmail.com> Mime-version: 1.0 Content-type: multipart/mixed; boundary="B_3346868011_20420855" X-ME-Bayesian: 0.000000 X-ME-Spam: No (-1010),Sender authenticated --B_3346868011_20420855 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable > From: stack > Reply-To: > Date: Wed, 20 Jan 2010 10:55:52 -0800 > To: > Subject: Re: Hbase pausing problems >=20 > On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia wrote: >=20 >>=20 >> Does this mean that when 1 regionserver does a memstore flush, the other >> two >> regionservers are also unavailable for writes? I have watched the logs >> carefully to make sure that not all the regionservers are flushing at th= e >> same time. Most of the time, only 1 server flushes at a time and in rar= e >> cases, I have seen two at a time. >>=20 >> No. >=20 > Flush is background process. Reads and writes go ahead while flushing is > happening. This is very nice to know: it is what I expected, but it also means that this problem is solvable :) >=20 >=20 >=20 >>>=20 >>> It also looks like you have little RAM space given over to hbase, just >> 1G? >>> If your traffic is bursty, giving hbase more RAM might help it get over >>> these write humps. >>=20 >> I have it at 1G on purpose. When we first had the problem, I immediatel= y >> thought the problem was resource related, so I increased the hBase RAM t= o >> 3G >> (each server has 8G - I was carefull to watch for swapping). This made = the >> problem worse because each memstore flush took longer which stopped writ= ing >> for longer and people started noticing that our system was down during >> those >> periods. >=20 >=20 > See above, flushing doesn't block read/writes. Maybe this was something > else? A GC pause that ran longer because heap is bigger? You said you h= ad > gc logging enabled. Did you see any long pauses? (Our ZooKeeper brother= s > suggest https://gchisto.dev.java.net/ as a help reading GC logs). >=20 Thanks this tool will be useful - I'll run through the GC Logs and see if anything jumps out at me. > Let me look at your logs to see if I see anything else up there. >=20 >=20 >=20 >>> Clients will be blocked writing regions carried by the effected >> regionserver >>> only. Your HW is not appropriate to the load as currently setup. You >> might >>> also consider adding more machines to your cluster. >>>=20 >>=20 >> Hmm... How does hBase decide which region to write to? Is it possible t= hat >> hBase is deciding to write all our current records to one specific regio= n >> that happens to be on the server that is busy doing a memstore flush? >>=20 >>=20 > Checkout the region list in master UI. See how they are defined by their > start and end key. Clients write rows to the region hosting the pertinen= t > row-span. I have attached the region list of the AdDelivery. Please let me know if this is something that you need me to upload to a server somewhere? >=20 > Its quiet possible all writes are going to a single region on a single > server -- which is often an issue -- if your key scheme has something lik= e > current time for a prefix. We are using UUID.randomUUID() as the row key - it has a pretty random prefix. >=20 >=20 >=20 >> We are currently inserting about 6 million rows per day. >=20 >=20 > 6M rows is low, even for a cluster as small as yours (though, maybe your > inserts are fat? Big cells, many at a time?). Inserts contain a maximum of 30 cells - most of the cells are type string containing integers. About 3 are strings containing no more than 10 characters, and about 4 are type string containing a decimal value. Not al= l 30 cells will exist, most often, some are left out because the data was not necessary for that specific row. Most rows will contain 25 cells. >=20 >=20 >=20 >> SQL Server (which >> I am so happy to no longer be using for this) was able to write (and >> replicate to a slave) 9 million records (using the same spec'ed server).= I >> would like to see hBase cope with the 3 we have given it at least when >> inserting 6 million. Do you think this is possible or is our only answe= r >> to >> throw on more servers? >>=20 >> 3 servers should be well able. Tell me more about your schema -- though= , > nevermind, i can find it in your master log. > St.Ack >=20 Cool :) Seraph >=20 >=20 >>> St.Ack >>>=20 >>>=20 >>>=20 >>>> Thank you for your assistance thus far; please let me know if you need >> or >>>> discover anything else? >>>>=20 >>>> Regards, >>>> Seraph >>>>=20 >>>>=20 >>>>=20 >>>>> From: Jean-Daniel Cryans >>>>> Reply-To: >>>>> Date: Mon, 18 Jan 2010 09:49:16 -0800 >>>>> To: >>>>> Subject: Re: Hbase pausing problems >>>>>=20 >>>>> The next step would be to take a look at your region server's log >>>>> around the time of the insert and clients who don't resume after the >>>>> loss of a region server. If you are able to gzip them and put them on >>>>> a public server, it would be awesome. >>>>>=20 >>>>> Thx, >>>>>=20 >>>>> J-D >>>>>=20 >>>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia >>>> wrote: >>>>>> Answers below... >>>>>>=20 >>>>>> Regards, >>>>>> Seraph >>>>>>=20 >>>>>>> From: stack >>>>>>> Reply-To: >>>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800 >>>>>>> To: >>>>>>> Subject: Re: Hbase pausing problems >>>>>>>=20 >>>>>>> How many CPUs? >>>>>>=20 >>>>>> 1x Quad Xeon in each server >>>>>>=20 >>>>>>>=20 >>>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh)= . >>>> You >>>>>>> might want to enable GC logging. See the line after hbase-env.sh. >>>> Enable >>>>>>> it. GC logging might tell you about the pauses you are seeing. >>>>>>=20 >>>>>> I will enable GC Logging tonight during our slow time because >> restarting >>>> the >>>>>> regionservers causes the clients to pause indefinitely. >>>>>>=20 >>>>>>>=20 >>>>>>> Can you get a fourth server for your cluster and run the master, zk= , >>>> and >>>>>>> namenode on it and leave the other three servers for regionserver a= nd >>>>>>> datanode (with perhaps replication =3D=3D 2 as per J-D to lighten load = on >>>> small >>>>>>> cluster). >>>>>>=20 >>>>>> We plan to double the number of servers in the next few weeks and I >> will >>>>>> take your advice to put the master, zk and namenode on it (we will >> need >>>> to >>>>>> have a second one on standby should this one crash). The servers wi= ll >>>> be >>>>>> ordered shortly and will be here in a week or two. >>>>>>=20 >>>>>> That said, I have been monitoring CPU usage and none of them seem >>>>>> particularly busy. The regionserver on each one hovers around 30% a= ll >>>> the >>>>>> time and the datanode sits at about 10% most of the time. If we do >> have >>>> a >>>>>> resource issue, it definitely does not seem to be CPU. >>>>>>=20 >>>>>> Increasing RAM did not seem to work either - it just made hBase use = a >>>> bigger >>>>>> memstore and then it took longer to do a flush. >>>>>>=20 >>>>>>=20 >>>>>>>=20 >>>>>>> More notes inline in below. >>>>>>>=20 >>>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia >>>> wrote: >>>>>>>=20 >>>>>>>> Approximately every 10 minutes, our entire coldfusion system pause= s >> at >>>> the >>>>>>>> point of inserting into hBase for between 30 and 60 seconds and th= en >>>>>>>> continues. >>>>>>>>=20 >>>>>>>> Yeah, enable GC logging. See if you can make correlation between >> the >>>> pause >>>>>>> the client is seeing and a GC pause. >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> Investigation... >>>>>>>>=20 >>>>>>>> Watching the logs of the regionserver, the pausing of the coldfusi= on >>>> system >>>>>>>> happens as soon as one of the regionservers starts flushing the >>>> memstore >>>>>>>> and >>>>>>>> recovers again as soon as it is finished flushing (recovers as soo= n >> as >>>> it >>>>>>>> starts compacting). >>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> ...though, this would seem to point to an issue with your hardware. >>>> How >>>>>>> many disks? Are they misconfigured such that they hold up the syst= em >>>> when >>>>>>> they are being heavily written to? >>>>>>>=20 >>>>>>>=20 >>>>>>> A regionserver log at DEBUG from around this time so we could look = at >>>> it >>>>>>> would be helpful. >>>>>>>=20 >>>>>>>=20 >>>>>>> I can recreate the error just by stopping 1 of the regionservers; b= ut >>>> then >>>>>>>> starting the regionserver again does not make coldfusion recover >> until >>>> I >>>>>>>> restart the coldfusion servers. It is important to note that if I >>>> keep the >>>>>>>> built in hBase shell running, it is happily able to put and get da= ta >>>> to and >>>>>>>> from hBase whilst coldfusion is busy pausing/failing. >>>>>>>>=20 >>>>>>>=20 >>>>>>> This seems odd. Enable DEBUG for the client-side. Do you see the >>>> shell >>>>>>> recalibrating finding new locations for regions after you shutdown >> the >>>>>>> single regionserver, something that your coldfusion is not doing? >> Or, >>>>>>> maybe, the shell is putting a regionserver that has not been >> disturbed >>>> by >>>>>>> your start/stop? >>>>>>>=20 >>>>>>>=20 >>>>>>>>=20 >>>>>>>> I have tried increasing the regionserver=B9s RAM to 3 Gigs and this >> just >>>> made >>>>>>>> the problem worse because it took longer for the regionservers to >>>> flush the >>>>>>>> memory store. >>>>>>>=20 >>>>>>>=20 >>>>>>> Again, if flushing is holding up the machine, if you can't write a >> file >>>> in >>>>>>> background without it freezing your machine, then your machines are >>>> anemic >>>>>>> or misconfigured? >>>>>>>=20 >>>>>>>=20 >>>>>>>> One of the links I found on your site mentioned increasing >>>>>>>> the default value for hbase.regionserver.handler.count to 100 =AD th= is >>>> did >>>>>>>> not >>>>>>>> seem to make any difference. >>>>>>>=20 >>>>>>>=20 >>>>>>> Leave this configuration in place I'd say. >>>>>>>=20 >>>>>>> Are you seeing 'blocking' messages in the regionserver logs? >>>> Regionserver >>>>>>> will stop taking on writes if it thinks its being overrun to preven= t >>>> itself >>>>>>> OOME'ing. Grep the 'multiplier' configuration in hbase-default.xml= . >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> I have double checked that the memory flush >>>>>>>> very rarely happens on more than 1 regionserver at a time =AD in fac= t >> in >>>> my >>>>>>>> many hours of staring at tails of logs, it only happened once wher= e >>>> two >>>>>>>> regionservers flushed at the same time. >>>>>>>>=20 >>>>>>>> You've enabled DEBUG? >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> My investigations point strongly towards a coding problem on our >> side >>>>>>>> rather >>>>>>>> than a problem with the server setup or hBase itself. >>>>>>>=20 >>>>>>>=20 >>>>>>> If things were slow from client-perspective, that might be a >>>> client-side >>>>>>> coding problem but these pauses, unless you have a fly-by deadlock = in >>>> your >>>>>>> client-code, its probably an hbase issue. >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> I say this because >>>>>>>> whilst I understand why a regionserver would go offline during a >>>> memory >>>>>>>> flush, I would expect the other two regionservers to pick up the >> load >>>> =AD >>>>>>>> especially since the built-in hbase shell has no problem accessing >>>> hBase >>>>>>>> whilst a regionserver is busy doing a memstore flush. >>>>>>>>=20 >>>>>>>> HBase does not go offline during memory flush. It continues to be >>>>>>> available for reads and writes during this time. And see J-D >> response >>>> for >>>>>>> incorrect understanding of how loading of regions is done in an hba= se >>>>>>> cluster. >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> ... >>>>>>>=20 >>>>>>>=20 >>>>>>> I think either I am leaving out code that is required to determine >>>> which >>>>>>>> RegionServers are available OR I am keeping too many hBase objects >> in >>>> RAM >>>>>>>> instead of calling their constructors each time (my purpose >> obviously >>>> was >>>>>>>> to >>>>>>>> improve performance). >>>>>>>>=20 >>>>>>>>=20 >>>>>>> For sure keep single instance of HBaseConfiguration at least and us= e >>>> this >>>>>>> constructing all HTable and HBaseAdmin instances. >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> Currently the live system is inserting over 7 Million records per >> day >>>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high loa= d. >>>>>>>>=20 >>>>>>>>=20 >>>>>>> What size are the records? What is your table schema? How many >>>> regions do >>>>>>> you currently have in your table? >>>>>>>=20 >>>>>>> St.Ack >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 --B_3346868011_20420855--