hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yeshwanth kumar <yeshwant...@gmail.com>
Subject HBase Region Size of 2.5 TB
Date Fri, 26 Aug 2016 22:23:01 GMT
Hi we are using  CDH 5.7 HBase 1.2

we are doing a performance testing over HBase through regular Load, which
has 4 Region Servers.

Input Data is compressed binary files around 2TB, which we process and
write as Key-Value pairs to HBase.
the output data size in  HBase is almost 4 times around 8TB, because we are
writing as text.
this process is a Map-Reduce Job,

when we are doing the load, we observed there's a lot of GC happening on
Region Server's ,so we changed couple of  parameters to decrease the GC
time.

we increased the flush size to 128MB to 1 GB and compactionThreshold to 50
and  regionserver.maxlogs to 42
following are the configuration we changed from default.


hbase.hregion.memstore.flush.size = 1 GB
hbase.hstore.max.filesize=10GB
hbase.hregion.preclose.flush.size= 50 MB

hbase.hstore.compactionThreshold=50
hbase.regionserver.maxlogs=42

after the load, we observed that HBase table has only 4 regions with each
of size around 2.5 TB

i am trying to understand, what configuration parameter caused this issue.

i was going through this article
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

Region split policy in our HBase is
org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
according to Region Split policy, Region Server should create regions when
the region size limit is exceeded.
can some one explain me the root cause.


Thanks,
Yeshwanth

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message