Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
Date: Sun, 1 Nov 2015 10:24:27 -0800
Message-ID: 
 <CAJ87iKP3Pt6w3NeAcwEbhaxYB=-4voDP75tOEKZCYFOO_vAtvg@mail.gmail.com>
Subject: Slow reads coinciding with higher compaction time avg time
From: Girish Joshi <gjoshi@groupon.com.INVALID>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a114128ac58867505237ec596

--001a114128ac58867505237ec596
Content-Type: text/plain; charset=UTF-8

Hello

In my hbase cluster, I observe the following consistently happening over
several days:-

- There is a spike in compaction time avg time metric. At the same time the
swap bytes in and swap bytes out also have higher value.
- Around the same time, I see the FS PRead and FS Read latencies and client
latencies doing random reads increase.

My hbase cluster consisting of 16 nodes and setup with a replication to
another cluster of 16 nodes has the following workload:-

- There are around 4 tables which have lot of write activity(around 500k
per second writes on m1/m15 moving average). 2 of these tables have atomic
counter columns keeping track of some analytics data and being incremented
with every write.

- There are 2 tables which receive bulk uploaded data periodically(around
once a day)

- We expect reads at around 100k per second mainly from tables which have
bulk upload data and the one which has counter columns. The read
latencies(p99) spike up to around 1000-5000 ms when the above compaction
time avg time metric increases. In other times, they are below 100 ms.

I have set the hbase.hregion.majorcompaction to 0 on region servers; I plan
to set it to 0 on master nodes too so that I can take out the possibility
of time triggered major compactions being the problem. But I suspect there
are lot of minor compactions and those leading to major compactions
happening at the time of spikes.

*Any suggestions on how to avoid this situation of read latency spikes and
have better read performance?*

Thanks,

Girish.

--001a114128ac58867505237ec596--