hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pere Kyle <p...@whisper.sh>
Subject Re: Hbase Unusable after auto split to 1024 regions
Date Thu, 06 Nov 2014 20:27:40 GMT
Bryan,

Thanks so much for the in depth details. The workload I am trying to account for is write
heavy/scan heavy. This is an analytics cluster that will need about 500-5000 w/s and handle
map reduce jobs. I see your recommendation of the i2.2xl, would you recommend this over the
m2.4xlarge? They seem to be the same except for price. 

The behavior I am seeing on my current cluster is basically a dead lock, where no writes/reads
get through for a period. Then a small burst of writes/reads happen and the cluster freezes
yet again. If I could just get this thing to event except a few writes all the systems would
be fine, but for now all the write queues on our app are filling up and OOMing eventually
due to this.

Our writes are like so:

api queues a batch of 100 events -> write to one of 17 thrift servers connections (persistent)
-> if fail, back off retry

Also the weediest behavior I have noticed about this lag/outage is that the master Hbase daemon
is eating all the CPU whereas before it barely had more than a 1.0 load. Is it possible the
master is in some way broken and slowing everything down?

-Pere

On Nov 6, 2014, at 11:58 AM, Bryan Beaudreault <bbeaudreault@hubspot.com> wrote:

> The problem is regions dont get uniform writes, but HLogs are on a
> regionserver level (at least in hbase 0.94), so that shouldn't be a
> problem.  I would keep it how it is.
> 
> Also this may not fix everything but will reduce compaction load.  You can
> also up hbase.hstore.compaction.max to compact more files at once when it
> does.
> 
> Also, just noticed you are using m2.2xlarge.  I wouldn't recommend this.
> It only has 1 ephemeral store, so maybe you are using EBS?  We've had
> performance issues with EBS in the past.  An EBS outage will also possibly
> bring down your whole cluster. If you are just using 1 disk, that will not
> be great either in terms of r/s.  It also caps at 500Mbps network and has
> only 13 ECUs.
> 
> Looks like you are only at 3TB of data currently.  I think you could get a
> substantial boost from using 5 or 6 i2.2xlarge instead of 15 m2.2xlarge,
> for about the same price and have room to grow data-size wise.  You might
> also try c1.xlarge which are more on-par price wise with more disk, but
> I've found the amount of memory on those restricting.  At HubSpot we use
> i2.4xlarge, but we also have about 250 of them.  I'd just recommend trying
> different setups, even the c3 level would be great if you can shrink your
> disk size at all (compression and data block encodings).
> 
> On Thu, Nov 6, 2014 at 2:31 PM, Pere Kyle <pere@whisper.sh> wrote:
> 
>> So set this property?
>> 
>> <property>
>> <name>hbase.regionserver.optionalcacheflushinterval</name>
>> <value>43200000</value>
>> <source>hbase-default.xml</source>
>> </property>
>> 
>> 
>> 
>> Do I need to set this as well?
>> 
>> <property>
>> <name>hbase.regionserver.logroll.period</name>
>> <value>3600000</value>
>> <source>hbase-default.xml</source>
>> </property>
>> 
>> Thanks,
>> Pere
>> 
>> 
>> On Nov 6, 2014, at 11:23 AM, Bryan Beaudreault <bbeaudreault@hubspot.com>
>> wrote:
>> 
>>> The default periodic flush is 1 hour. If you have a lot of regions and
>> your
>>> write distribution is not strictly uniform this can cause a lot of small
>>> flushes, as you are seeing.  I tuned this up to 12 hours in my cluster,
>> and
>>> may tune it up further.  It made a big impact on the number of minor
>>> compactions running throughout the day.
>>> 
>>> On Thu, Nov 6, 2014 at 2:14 PM, Pere Kyle <pere@whisper.sh> wrote:
>>> 
>>>> Thanks again for your help!
>>>> 
>>>> I do not see a single entry in my logs for memstore pressure/global
>> heap.
>>>> I do see tons of logs from the periodicFlusher:
>>>> http://pastebin.com/8ZyVz8AH
>>>> 
>>>> This seems odd to me. Today alone there are 1829 flushes from
>>>> periodicFlusher. Is there some other log4j property I need to set?
>>>> 
>>>> Here are some logs from memstore flushes:
>>>> 2014-11-06 19:11:42,000 INFO
>>>> org.apache.hadoop.hbase.regionserver.StoreFile
>>>> (regionserver60020.cacheFlusher): NO General Bloom and NO DeleteFamily
>> was
>>>> added to HFile (hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/4bafc4f16d984b2cca905e149584df8e/.tmp/c42aacd7e6c047229bb12291510bff50
>>>> )
>>>> 2014-11-06 19:11:42,000 INFO org.apache.hadoop.hbase.regionserver.Store
>>>> (regionserver60020.cacheFlusher): Flushed , sequenceid=67387584,
>>>> memsize=29.5 K, into tmp file hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/4bafc4f16d984b2cca905e149584df8e/.tmp/c42aacd7e6c047229bb12291510bff50
>>>> 2014-11-06 19:11:44,683 INFO org.apache.hadoop.hbase.regionserver.Store
>>>> (regionserver60020.cacheFlusher): Added hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/4bafc4f16d984b2cca905e149584df8e/d/c42aacd7e6c047229bb12291510bff50
>> ,
>>>> entries=150, sequenceid=67387584, filesize=3.2 K
>>>> 2014-11-06 19:11:44,685 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion
>>>> (regionserver60020.cacheFlusher): Finished memstore flush of ~29.5
>> K/30176,
>>>> currentsize=0/0 for region
>>>> 
>> weaver_events,21476b2c-7257-4787-9309-aaeab1e85392,1415157492044.4bafc4f16d984b2cca905e149584df8e.
>>>> in 3880ms, sequenceid=67387584, compaction requested=false
>>>> 2014-11-06 19:11:44,714 INFO
>>>> org.apache.hadoop.hbase.regionserver.StoreFile
>>>> (regionserver60020.cacheFlusher): Delete Family Bloom filter type for
>>>> hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/.tmp/f10e53628784487290f788802808777a
>> :
>>>> CompoundBloomFilterWriter
>>>> 2014-11-06 19:11:44,729 INFO
>>>> org.apache.hadoop.hbase.regionserver.StoreFile
>>>> (regionserver60020.cacheFlusher): NO General Bloom and NO DeleteFamily
>> was
>>>> added to HFile (hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/.tmp/f10e53628784487290f788802808777a
>>>> )
>>>> 2014-11-06 19:11:44,729 INFO org.apache.hadoop.hbase.regionserver.Store
>>>> (regionserver60020.cacheFlusher): Flushed , sequenceid=67387656,
>>>> memsize=41.2 K, into tmp file hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/.tmp/f10e53628784487290f788802808777a
>>>> 2014-11-06 19:11:44,806 INFO org.apache.hadoop.hbase.regionserver.Store
>>>> (regionserver60020.cacheFlusher): Added hdfs://
>>>> 
>> 10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/d/f10e53628784487290f788802808777a
>> ,
>>>> entries=210, sequenceid=67387656, filesize=4.1 K
>>>> 2014-11-06 19:11:44,807 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion
>>>> (regionserver60020.cacheFlusher): Finished memstore flush of ~41.2
>> K/42232,
>>>> currentsize=17.7 K/18080 for region
>>>> 
>> weaver_events,30f6c923-8a37-4324-a404-377decd3ae06,1415154978597.9b4c4b73035749a9865103366c9a5a87.
>>>> in 99ms, sequenceid=67387656, compaction requested=false
>>>> 
>>>> Thanks!
>>>> -Pere
>>>> On Nov 6, 2014, at 10:27 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> 
>>>>> bq. Do I need to restart master for the memstore to take effect?
>>>>> No. memstore is used by region server.
>>>>> 
>>>>> Looks like debug logging was not turned on (judging from your previous
>>>>> pastebin).
>>>>> Some of flush related logs are at INFO level. e.g. Do you see any of
>> the
>>>>> following log ?
>>>>> 
>>>>>    LOG.info("Flush of region " + regionToFlush + " due to global heap
>>>>> pressure");
>>>>> 
>>>>> Take a look
>>>>> at
>>>> 
>> ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
>>>>> and you will find all the logs.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Thu, Nov 6, 2014 at 10:05 AM, Pere Kyle <pere@whisper.sh> wrote:
>>>>> 
>>>>>> So I have set the heap to 12Gb and the memstore limit to upperLimit
.5
>>>>>> lowerLimit .45. I am not seeing any changes in behavior from the
>>>> cluster so
>>>>>> far, i have restarted 4/17 region servers. Do I need to restart master
>>>> for
>>>>>> the memstore to take effect? Also how do I enable logging to show
why
>> a
>>>>>> region is being flushed? I don’t ever seen the region flushes in
my
>>>> logs.
>>>>>> 
>>>>>> Thanks,
>>>>>> Pere
>>>>>> On Nov 6, 2014, at 7:12 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>>> 
>>>>>>> bq. to increase heap and increase the memstore limit?
>>>>>>> 
>>>>>>> Yes. That would be an action that bears fruit.
>>>>>>> Long term, you should merge the small regions.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> On Wed, Nov 5, 2014 at 11:20 PM, Pere Kyle <pere@whisper.sh>
wrote:
>>>>>>> 
>>>>>>>> Watching closely a region server in action. It seems that
the
>>>> memstores
>>>>>>>> are being flushed at around  2MB on the regions. This would
seem to
>>>>>>>> indicate that there is not enough heap for the memstore and
I am
>>>> hitting
>>>>>>>> the upper bound of limit (default). Would this be a fair
assumption?
>>>>>> Should
>>>>>>>> I look to increase heap and increase the memstore limit?
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> -Pere
>>>>>>>> 
>>>>>>>> On Nov 5, 2014, at 10:26 PM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>>> You can use ConstantSizeRegionSplitPolicy.
>>>>>>>>> Split policy can be specified per table. See the following
example
>>>>>>>>> in create.rb :
>>>>>>>>> 
>>>>>>>>> hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO
=>
>>>>>>>>> 'HexStringSplit'}
>>>>>>>>> 
>>>>>>>>> In 0.94.18, there isn't online merge. So you have to
use other
>> method
>>>>>> to
>>>>>>>>> merge the small regions.
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> 
>>>>>>>>> On Wed, Nov 5, 2014 at 10:14 PM, Pere Kyle <pere@whisper.sh>
>> wrote:
>>>>>>>>> 
>>>>>>>>>> Ted,
>>>>>>>>>> 
>>>>>>>>>> Thanks so much for that information. I now see why
this split too
>>>>>> often,
>>>>>>>>>> but what I am not sure of is how to fix this without
blowing away
>>>> the
>>>>>>>>>> cluster. Add more heap?
>>>>>>>>>> 
>>>>>>>>>> Another symptom I have noticed is that load on the
Master instance
>>>>>> hbase
>>>>>>>>>> daemon has been pretty high (load average 4.0, whereas
it used to
>> be
>>>>>>>> 1.0)
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Pere
>>>>>>>>>> 
>>>>>>>>>> On Nov 5, 2014, at 9:56 PM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>>>>> 
>>>>>>>>>>> IncreasingToUpperBoundRegionSplitPolicy is the
default split
>>>> policy.
>>>>>>>>>>> 
>>>>>>>>>>> You can read the javadoc of this class to see
how it works.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Nov 5, 2014 at 9:39 PM, Ted Yu <yuzhihong@gmail.com>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Can you provide a bit more information (such
as HBase release) ?
>>>>>>>>>>>> 
>>>>>>>>>>>> If you pastebin one of the region servers'
log, that would help
>> us
>>>>>>>>>>>> determine the cause.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Nov 5, 2014 at 9:29 PM, Pere Kyle
<pere@whisper.sh>
>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Recently our cluster which has been running
fine for 2 weeks
>>>> split
>>>>>> to
>>>>>>>>>>>>> 1024 regions at 1GB per region, after
this split the cluster is
>>>>>>>>>> unusable.
>>>>>>>>>>>>> Using the performance benchmark I was
getting a little better
>>>> than
>>>>>>>> 100
>>>>>>>>>> w/s,
>>>>>>>>>>>>> whereas before it was 5000 w/s. There
are 15 nodes of
>> m2.2xlarge
>>>>>> with
>>>>>>>>>> 8GB
>>>>>>>>>>>>> heap reserved for Hbase
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any Ideas? I am stumped:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Pere
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here is the current
>>>>>>>>>>>>> hbase-site.xml
>>>>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>>>>> <configuration>
>>>>>>>>>>>>> <property>
>>>>>>>>>>>>> <name>hbase.snapshot.enabled</name>
>>>>>>>>>>>>> <value>true</value>
>>>>>>>>>>>>> </property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>fs.hdfs.impl</name><value>emr.hbase.fs.BlockableFileSystem</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.regionserver.handler.count</name><value>50</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.cluster.distributed</name><value>true</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.tmp.dir</name><value>/mnt/var/lib/hbase/tmp-data</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.master.wait.for.log.splitting</name><value>true</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.hregion.memstore.flush.size</name><value>134217728</value></property>
>>>>>>>>>>>>> 
>>>> <property><name>hbase.hregion.max.filesize</name><value>5073741824
>>>>>>>>>>>>> </value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>zookeeper.session.timeout</name><value>60000</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.thrift.maxQueuedRequests</name><value>0</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.client.scanner.caching</name><value>1000</value></property>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.hregion.memstore.block.multiplier</name><value>4</value></property>
>>>>>>>>>>>>> </configuration>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> hbase-env.sh
>>>>>>>>>>>>> # The maximum amount of heap to use,
in MB. Default is 1000.
>>>>>>>>>>>>> export HBASE_HEAPSIZE=8000
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Extra Java runtime options.
>>>>>>>>>>>>> # Below are what we set by default. 
May only work with SUN
>> JVM.
>>>>>>>>>>>>> # For more on why as well as other possible
settings,
>>>>>>>>>>>>> # see http://wiki.apache.org/hadoop/PerformanceTuning
>>>>>>>>>>>>> export HBASE_OPTS="-XX:+UseConcMarkSweepGC”
>>>>>>>>>>>>> 
>>>>>>>>>>>>> hbase-env.sh
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message