Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
User-Agent: Microsoft-Entourage/13.0.0.090609
Date: Wed, 20 Jan 2010 21:33:26 +0200
Subject: Re: Hbase pausing problems
From: Seraph Imalia <seraph@eisp.co.za>
To: <hbase-user@hadoop.apache.org>
Message-ID: <C77D2727.3141F%seraph@eisp.co.za>
Thread-Topic: Hbase pausing problems
Thread-Index: AcqaB2+GIqtg86vJCkqUzAms2Egkvg==
In-Reply-To: <7c962aed1001201055p2702949tb987075cf48dfde1@mail.gmail.com>
Mime-version: 1.0
Content-type: multipart/mixed;
	boundary="B_3346868011_20420855"

--B_3346868011_20420855
Content-type: text/plain;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable


> From: stack <stack@duboce.net>
> Reply-To: <hbase-user@hadoop.apache.org>
> Date: Wed, 20 Jan 2010 10:55:52 -0800
> To: <hbase-user@hadoop.apache.org>
> Subject: Re: Hbase pausing problems
>=20
> On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <seraph@eisp.co.za> wrote:
>=20
>>=20
>> Does this mean that when 1 regionserver does a memstore flush, the other
>> two
>> regionservers are also unavailable for writes?  I have watched the logs
>> carefully to make sure that not all the regionservers are flushing at th=
e
>> same time.  Most of the time, only 1 server flushes at a time and in rar=
e
>> cases, I have seen two at a time.
>>=20
>> No.
>=20
> Flush is background process.  Reads and writes go ahead while flushing is
> happening.

This is very nice to know: it is what I expected, but it also means that
this problem is solvable :)

>=20
>=20
>=20
>>>=20
>>> It also looks like you have little RAM space given over to hbase, just
>> 1G?
>>> If your traffic is bursty, giving hbase more RAM might help it get over
>>> these write humps.
>>=20
>> I have it at 1G on purpose.  When we first had the problem, I immediatel=
y
>> thought the problem was resource related, so I increased the hBase RAM t=
o
>> 3G
>> (each server has 8G - I was carefull to watch for swapping).  This made =
the
>> problem worse because each memstore flush took longer which stopped writ=
ing
>> for longer and people started noticing that our system was down during
>> those
>> periods.
>=20
>=20
> See above, flushing doesn't block read/writes.  Maybe this was something
> else?  A GC pause that ran longer because heap is bigger?  You said you h=
ad
> gc logging enabled.  Did you see any long pauses?  (Our ZooKeeper brother=
s
> suggest https://gchisto.dev.java.net/ as a help reading GC logs).
>=20

Thanks this tool will be useful - I'll run through the GC Logs and see if
anything jumps out at me.

> Let me look at your logs to see if I see anything else up there.
>=20
>=20
>=20
>>> Clients will be blocked writing regions carried by the effected
>> regionserver
>>> only.  Your HW is not appropriate to the load as currently setup.  You
>> might
>>> also consider adding more machines to your cluster.
>>>=20
>>=20
>> Hmm... How does hBase decide which region to write to?  Is it possible t=
hat
>> hBase is deciding to write all our current records to one specific regio=
n
>> that happens to be on the server that is busy doing a memstore flush?
>>=20
>>=20
> Checkout the region list in master UI.  See how they are defined by their
> start and end key.  Clients write rows to the region hosting the pertinen=
t
> row-span.

I have attached the region list of the AdDelivery.  Please let me know if
this is something that you need me to upload to a server somewhere?

>=20
> Its quiet possible all writes are going to a single region on a single
> server -- which is often an issue -- if your key scheme has something lik=
e
> current time for a prefix.

We are using UUID.randomUUID() as the row key - it has a pretty random
prefix.

>=20
>=20
>=20
>> We are currently inserting about 6 million rows per day.
>=20
>=20
> 6M rows is low, even for a cluster as small as yours (though, maybe your
> inserts are fat?  Big cells, many at a time?).

Inserts contain a maximum of 30 cells - most of the cells are type string
containing integers.  About 3 are strings containing no more than 10
characters, and about 4 are type string containing a decimal value.  Not al=
l
30 cells will exist, most often, some are left out because the data was not
necessary for that specific row. Most rows will contain 25 cells.

>=20
>=20
>=20
>> SQL Server (which
>> I am so happy to no longer be using for this) was able to write (and
>> replicate to a slave) 9 million records (using the same spec'ed server).=
  I
>> would like to see hBase cope with the 3 we have given it at least when
>> inserting 6 million.  Do you think this is possible or is our only answe=
r
>> to
>> throw on more servers?
>>=20
>> 3 servers should be well able.  Tell me more about your schema -- though=
,
> nevermind, i can find it in your master log.
> St.Ack
>=20

Cool :)
Seraph

>=20
>=20
>>> St.Ack
>>>=20
>>>=20
>>>=20
>>>> Thank you for your assistance thus far; please let me know if you need
>> or
>>>> discover anything else?
>>>>=20
>>>> Regards,
>>>> Seraph
>>>>=20
>>>>=20
>>>>=20
>>>>> From: Jean-Daniel Cryans <jdcryans@apache.org>
>>>>> Reply-To: <hbase-user@hadoop.apache.org>
>>>>> Date: Mon, 18 Jan 2010 09:49:16 -0800
>>>>> To: <hbase-user@hadoop.apache.org>
>>>>> Subject: Re: Hbase pausing problems
>>>>>=20
>>>>> The next step would be to take a look at your region server's log
>>>>> around the time of the insert and clients who don't resume after the
>>>>> loss of a region server. If you are able to gzip them and put them on
>>>>> a public server, it would be awesome.
>>>>>=20
>>>>> Thx,
>>>>>=20
>>>>> J-D
>>>>>=20
>>>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <seraph@eisp.co.za>
>>>> wrote:
>>>>>> Answers below...
>>>>>>=20
>>>>>> Regards,
>>>>>> Seraph
>>>>>>=20
>>>>>>> From: stack <stack@duboce.net>
>>>>>>> Reply-To: <hbase-user@hadoop.apache.org>
>>>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800
>>>>>>> To: <hbase-user@hadoop.apache.org>
>>>>>>> Subject: Re: Hbase pausing problems
>>>>>>>=20
>>>>>>> How many CPUs?
>>>>>>=20
>>>>>> 1x Quad Xeon in each server
>>>>>>=20
>>>>>>>=20
>>>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh)=
.
>>>>  You
>>>>>>> might want to enable GC logging.  See the line after hbase-env.sh.
>>>>  Enable
>>>>>>> it.  GC logging might tell you about the pauses you are seeing.
>>>>>>=20
>>>>>> I will enable GC Logging tonight during our slow time because
>> restarting
>>>> the
>>>>>> regionservers causes the clients to pause indefinitely.
>>>>>>=20
>>>>>>>=20
>>>>>>> Can you get a fourth server for your cluster and run the master, zk=
,
>>>> and
>>>>>>> namenode on it and leave the other three servers for regionserver a=
nd
>>>>>>> datanode (with perhaps replication =3D=3D 2 as per J-D to lighten load =
on
>>>> small
>>>>>>> cluster).
>>>>>>=20
>>>>>> We plan to double the number of servers in the next few weeks and I
>> will
>>>>>> take your advice to put the master, zk and namenode on it (we will
>> need
>>>> to
>>>>>> have a second one on standby should this one crash).  The servers wi=
ll
>>>> be
>>>>>> ordered shortly and will be here in a week or two.
>>>>>>=20
>>>>>> That said, I have been monitoring CPU usage and none of them seem
>>>>>> particularly busy.  The regionserver on each one hovers around 30% a=
ll
>>>> the
>>>>>> time and the datanode sits at about 10% most of the time.  If we do
>> have
>>>> a
>>>>>> resource issue, it definitely does not seem to be CPU.
>>>>>>=20
>>>>>> Increasing RAM did not seem to work either - it just made hBase use =
a
>>>> bigger
>>>>>> memstore and then it took longer to do a flush.
>>>>>>=20
>>>>>>=20
>>>>>>>=20
>>>>>>> More notes inline in below.
>>>>>>>=20
>>>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <seraph@eisp.co.za>
>>>> wrote:
>>>>>>>=20
>>>>>>>> Approximately every 10 minutes, our entire coldfusion system pause=
s
>> at
>>>> the
>>>>>>>> point of inserting into hBase for between 30 and 60 seconds and th=
en
>>>>>>>> continues.
>>>>>>>>=20
>>>>>>>> Yeah, enable GC logging.  See if you can make correlation between
>> the
>>>> pause
>>>>>>> the client is seeing and a GC pause.
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>> Investigation...
>>>>>>>>=20
>>>>>>>> Watching the logs of the regionserver, the pausing of the coldfusi=
on
>>>> system
>>>>>>>> happens as soon as one of the regionservers starts flushing the
>>>> memstore
>>>>>>>> and
>>>>>>>> recovers again as soon as it is finished flushing (recovers as soo=
n
>> as
>>>> it
>>>>>>>> starts compacting).
>>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> ...though, this would seem to point to an issue with your hardware.
>>>>  How
>>>>>>> many disks?  Are they misconfigured such that they hold up the syst=
em
>>>> when
>>>>>>> they are being heavily written to?
>>>>>>>=20
>>>>>>>=20
>>>>>>> A regionserver log at DEBUG from around this time so we could look =
at
>>>> it
>>>>>>> would be helpful.
>>>>>>>=20
>>>>>>>=20
>>>>>>> I can recreate the error just by stopping 1 of the regionservers; b=
ut
>>>> then
>>>>>>>> starting the regionserver again does not make coldfusion recover
>> until
>>>> I
>>>>>>>> restart the coldfusion servers.  It is important to note that if I
>>>> keep the
>>>>>>>> built in hBase shell running, it is happily able to put and get da=
ta
>>>> to and
>>>>>>>> from hBase whilst coldfusion is busy pausing/failing.
>>>>>>>>=20
>>>>>>>=20
>>>>>>> This seems odd.  Enable DEBUG for the client-side.  Do you see the
>>>> shell
>>>>>>> recalibrating finding new locations for regions after you shutdown
>> the
>>>>>>> single regionserver, something that your coldfusion is not doing?
>>  Or,
>>>>>>> maybe, the shell is putting a regionserver that has not been
>> disturbed
>>>> by
>>>>>>> your start/stop?
>>>>>>>=20
>>>>>>>=20
>>>>>>>>=20
>>>>>>>> I have tried increasing the regionserver=B9s RAM to 3 Gigs and this
>> just
>>>> made
>>>>>>>> the problem worse because it took longer for the regionservers to
>>>> flush the
>>>>>>>> memory store.
>>>>>>>=20
>>>>>>>=20
>>>>>>> Again, if flushing is holding up the machine, if you can't write a
>> file
>>>> in
>>>>>>> background without it freezing your machine, then your machines are
>>>> anemic
>>>>>>> or misconfigured?
>>>>>>>=20
>>>>>>>=20
>>>>>>>> One of the links I found on your site mentioned increasing
>>>>>>>> the default value for hbase.regionserver.handler.count to 100 =AD th=
is
>>>> did
>>>>>>>> not
>>>>>>>> seem to make any difference.
>>>>>>>=20
>>>>>>>=20
>>>>>>> Leave this configuration in place I'd say.
>>>>>>>=20
>>>>>>> Are you seeing 'blocking' messages in the regionserver logs?
>>>>  Regionserver
>>>>>>> will stop taking on writes if it thinks its being overrun to preven=
t
>>>> itself
>>>>>>> OOME'ing.  Grep the 'multiplier' configuration in hbase-default.xml=
.
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>> I have double checked that the memory flush
>>>>>>>> very rarely happens on more than 1 regionserver at a time =AD in fac=
t
>> in
>>>> my
>>>>>>>> many hours of staring at tails of logs, it only happened once wher=
e
>>>> two
>>>>>>>> regionservers flushed at the same time.
>>>>>>>>=20
>>>>>>>> You've enabled DEBUG?
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>> My investigations point strongly towards a coding problem on our
>> side
>>>>>>>> rather
>>>>>>>> than a problem with the server setup or hBase itself.
>>>>>>>=20
>>>>>>>=20
>>>>>>> If things were slow from client-perspective, that might be a
>>>> client-side
>>>>>>> coding problem but these pauses, unless you have a fly-by deadlock =
in
>>>> your
>>>>>>> client-code, its probably an hbase issue.
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>>  I say this because
>>>>>>>> whilst I understand why a regionserver would go offline during a
>>>> memory
>>>>>>>> flush, I would expect the other two regionservers to pick up the
>> load
>>>> =AD
>>>>>>>> especially since the built-in hbase shell has no problem accessing
>>>> hBase
>>>>>>>> whilst a regionserver is busy doing a memstore flush.
>>>>>>>>=20
>>>>>>>> HBase does not go offline during memory flush.  It continues to be
>>>>>>> available for reads and writes during this time.  And see J-D
>> response
>>>> for
>>>>>>> incorrect understanding of how loading of regions is done in an hba=
se
>>>>>>> cluster.
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> ...
>>>>>>>=20
>>>>>>>=20
>>>>>>> I think either I am leaving out code that is required to determine
>>>> which
>>>>>>>> RegionServers are available OR I am keeping too many hBase objects
>> in
>>>> RAM
>>>>>>>> instead of calling their constructors each time (my purpose
>> obviously
>>>> was
>>>>>>>> to
>>>>>>>> improve performance).
>>>>>>>>=20
>>>>>>>>=20
>>>>>>> For sure keep single instance of HBaseConfiguration at least and us=
e
>>>> this
>>>>>>> constructing all HTable and HBaseAdmin instances.
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>> Currently the live system is inserting over 7 Million records per
>> day
>>>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously high loa=
d.
>>>>>>>>=20
>>>>>>>>=20
>>>>>>> What size are the records?   What is your table schema?  How many
>>>> regions do
>>>>>>> you currently have in your table?
>>>>>>>=20
>>>>>>>  St.Ack
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20


--B_3346868011_20420855--