hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Hbase tuning for heavy write cluster
Date Mon, 27 Jan 2014 18:37:57 GMT
On Sun, Jan 26, 2014 at 4:13 PM, Rohit Dev <rohitdevel14@gmail.com> wrote:

> Hi Lars,
> I changed java heap to 31GB and also reduced memstore flush size to 256MB
> (down from 512MB). All of the servers are running quiet, except for 1.
> - This 1 particular server is doing ~100 Memstore flushes in every 5 Mins,
> that is about 55% of total Memstore flushes in the cluster.

Hotspotting on regions hosted on this server?  If more than one hot region,
move the hot regions around the cluster?

Pastebin tail of this particular RS if that is ok to do?

> - CPU in this server is running ~100% and system load is also very high
> (50). This is 24core machine.

Any complaints in dmessage?  (Disks?)

> - jstack dump from this region-server is available at
> http://pastebin.com/an0XvZRc , seems most of the threads are in blocked
> state.

Took a quick look and handlers are hanging out waiting on response from

Short-circuit reads enabled?

> - io %utilization is under 15%

Beating all disks equally or focused on single disk?

> - Compaction queue size has been building up in this server, gone up from
> 50 to 280 in last 4 hrs.

Compactions are not completing?  This server is hosting the 'big' region
with many store files?

> - I noticed requestsPerSecond (from Hbase-Master web ui) goes upto 350k on
> this particular server, where as other server are doing < 30k.

> Any suggestion what could be causing high load on this one server ?

> Also, I'm seeing messages like this on multiple servers (about 25% in the
> server that has high load):
> INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs:
> logs=91, maxlogs=90; forcing flush of 7 regions(s):
> 30d495d3fb5cdfcdac8073d02a05df90, 3ccf33b6da357f0e2d76588895c9f2ab,
> 499b5ea7c51493995ab942dc5f00a8b5, 7b98e852476ee8432f3d795cd0b4b92b,
> 7baaf5a2bd916e12a69f390971dd5bb8, 81716547748c93a90767eff50cd2e6bf,
> 99f4e9d306a5570622ab18ac6d142db9
> Could this be a issue ?

(The above needs doc'ing in the refguide)

The above message comes when you are writing at a rate that the server is
having trouble keeping up with.  The RS is carrying more than the
configured amount of WALs and so we are complaining and acting to clean up
excess.  It is not a 'problem' if we are not too far beyond your configured
maximum (90 in this case which is a lot -- many WALs means there is more to
replay when a server crashes so getting regions back on line takes longer).


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message