hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Borah <gbo...@appdynamics.com>
Subject Re: impact of using higher Hbase.hregion.memstore.flush.size=512MB
Date Thu, 28 May 2015 01:00:54 GMT
Hi Esteban,

Thanks for your response. hbase.rs.cacheblocksonwrite would be very useful
for us.

We have set hbase.regionserver.maxlogs appropriately to avoid flush across
memstores. Also set hbase.regionserver.optionalcacheflushinterval to 0 to
disable periodic flushing, we do not write anything by passing the WAL.

We are running the cluster with conservative limits, so that if a region
server crashes, others can take the extra load without hitting the memstore
flushing limits.

We are running the cluster now at 800MB flush size, initial job runs are
fine. We will run it for couple of days and check the status.

Thanks again.

Gautam




On Wed, May 27, 2015 at 2:15 PM, Esteban Gutierrez <esteban@cloudera.com>
wrote:

> Gautam,
>
> Yes, you can increase the size of the memstore to values larger to 128MB
> but usually you go by increasing hbase.hregion.memstore.block.multiplier
> only. Depending on the version of HBase you are running many things can
> happen, e.g. multiple memstores can be flushed at once and/or the memstores
> will be flushed if there are some rows in memory (30 million) or if the
> store hasn't been flushed in an hour, the rate of the flushes can be tuned
> and also if you are hitting the max number of HLogs that can trigger a
> flush. One problem  running with large memstores is mostly how many regions
> you will have per RS and if using some encoding and/or compression codec is
> being used might cause the flush to take longer or use more CPU resources
> or push back clients b/c you haven't flushed some regions to disk.
>
> Based on the the behavior that you have described on the heap utilization
> sounds like you are not fully utilizing the memstores and you are below the
> lower limit, so depending on the version of HBase and available resources
> you might want to use hbase.rs.cacheblocksonwrite instead to keep some of
> the hot data in the block cache.
>
> cheers,
> esteban.
>
>
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, May 27, 2015 at 1:58 PM, Gautam Borah <gborah@appdynamics.com>
> wrote:
>
> > Hi all,
> >
> > The default size of Hbase.hregion.memstore.flush.size is define as 128
> MB .
> > Could anyone kindly explain what would be the impact if we increase this
> to
> > a higher value 512 MB or 800 MB or higher.
> >
> > We have a very write heavy cluster. Also we run periodic end point co
> > processor based jobs that operate on the data written in the last 10-15
> > mins, every 10 minute. We are trying to manage the memstore flush
> > operations such that the hot data remains in memstore for at least 30-40
> > mins or longer, so that the job hits disk every 3rd or 4th time it tries
> to
> > operate on the hot data (it does scan).
> >
> > We have region server heap size of 20 GB and set the,
> >
> > hbase.regionserver.global.memstore.lowerLimit = .45
> >
> > hbase.regionserver.global.memstore.upperLimit = .55
> >
> > We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
> only
> > 10% of the heap is utilized by memstore, after that memstore flushes.
> >
> > At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
> > heap utilization to by memstore to 35%.
> >
> > It would be very helpful for us to understand the implication of higher
> > Hbase.hregion.memstore.flush.size  for a long running cluster.
> >
> > Thanks,
> >
> > Gautam
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message