Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: <CAEf6Z5JhmNJbQsc0VhfqD9Eh_536f=Zy9sz_2zarq3YQDj4w2Q@mail.gmail.com>
References: <CAHyUw2M=hWad_Yy_apc-_krDnGtgcncbSpEPvXcmPPR9F6XLeA@mail.gmail.com>
 <CADcMMgF1Fwj20WggsJHsR2TgrTNoyZ2SjNgDpq_iVY=PBht7nQ@mail.gmail.com>
 <CA+0yoyiDYXDsF6XtwdFCsf_M0Pt4CDU__Js86cGo6dTc+R_HZw@mail.gmail.com>
 <CAAg3a2oUeuwDJLU4kuNqg61bP_KygYqhmkV3LpTSbJ2xj=Upbg@mail.gmail.com>
 <CAM7-19+YmRGKSdCqtvikQ49Z=ODZcMdmKcLYLwCzsOv6wyyWmQ@mail.gmail.com>
 <CAHyUw2MU8prE_wt0q6KeO1TiCxy=z2pDCLnX6-VM-NBFH8D0KA@mail.gmail.com>
 <CAHyUw2Naz6SBB1d9dUWpA67adtoh1394VSzHW5ohg6-Nb186Tw@mail.gmail.com> <CAEf6Z5JhmNJbQsc0VhfqD9Eh_536f=Zy9sz_2zarq3YQDj4w2Q@mail.gmail.com>
From: Allan Yang <allan163@gmail.com>
Date: Wed, 22 Mar 2017 10:07:38 +0800
Message-ID: <CAC=ezUmzC4YySVsT2W4B--Avq8OCDDwddRCd5oAK_Fb0+20jvQ@mail.gmail.com>
Subject: Re: how to optimize for heavy writes scenario
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=94eb2c19109c911127054b483946
archived-at: Wed, 22 Mar 2017 02:13:05 -0000

--94eb2c19109c911127054b483946
Content-Type: text/plain; charset=UTF-8

hbase.regionserver.thread.compaction.small = 30
Am I seeing it right? You used 30 threads for small compaction. That's too
much. For heavy writes scenario, you used too much resource to do
compactions.
We also have OpenTSDB running on HBase in our company. IMHO, the conf
should like this:
hbase.regionserver.thread.compaction.small = 1 or 2
hbase.regionserver.thread.compaction.large = 1
hbase.hstore.compaction.max = 20
hbase.hstore.compaction.min/hbase.hstore.compactionThreshold= 8 or 10 in
your config
hbase.hregion.memstore.flush.size = 256MB or bigger, depend on the memory
size, for writers like OpenTSDB, the data after encoding and compression is
very small(by the way, have you set any encoding algo or compression
algroritm on your table? If not, better do it now)
hbase.regionserver.thread.compaction.throttle = 512MB
These configs should decrease the frequency of compactions, and also
decrease the resources(threads) compactions used.
Maybe you can give a try.


2017-03-21 23:48 GMT+08:00 Dejan Menges <dejan.menges@gmail.com>:

> Regarding du -sk, take a look here
> https://issues.apache.org/jira/browse/HADOOP-9884
>
> Also hardly waiting for this one to be fixed.
>
> On Tue, Mar 21, 2017 at 4:09 PM Hef <hef.online@gmail.com> wrote:
>
> > There were several curious things we have observed:
> > One the region servers, there were abnormal much more reads than writes:
> > Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> > sda             608.00      6552.00         0.00       6552          0
> > sdb             345.00      2692.00     78868.00       2692      78868
> > sdc             406.00     14548.00     63960.00      14548      63960
> > sdd               2.00         0.00        32.00          0         32
> > sde              62.00      8764.00         0.00       8764          0
> > sdf             498.00     11100.00        32.00      11100         32
> > sdg            2080.00     11712.00         0.00      11712          0
> > sdh             109.00      5072.00         0.00       5072          0
> > sdi             158.00         4.00     32228.00          4      32228
> > sdj              43.00      5648.00        32.00       5648         32
> > sdk             255.00      3784.00         0.00       3784          0
> > sdl              86.00      1412.00      9176.00       1412       9176
> >
> > In CDH region server dashboard, the Average Disk IOPS for writes were
> > stable on 735/s, while the reads raised from 900/s to 5000/s every 5
> > minutes.
> >
> > iotop shown the following processes were eating the most io:
> >  6447 be/4 hdfs        2.70 M/s    0.00 B/s  0.00 % 94.54 % du -sk
> > /data/12/dfs/dn/curre~632-10.1.1.100-1457937043486
> >  6023 be/4 hdfs        2.54 M/s    0.00 B/s  0.00 % 92.14 % du -sk
> > /data/9/dfs/dn/curren~632-10.1.1.100-1457937043486
> >  6186 be/4 hdfs     1379.58 K/s    0.00 B/s  0.00 % 90.78 % du -sk
> > /data/11/dfs/dn/curre~632-10.1.1.100-1457937043486
> >
> > What were all this reading for? And what are thos du -sk processes? Could
> > this be a reason to slow down the write throughput?
> >
> >
> >
> > On Tue, Mar 21, 2017 at 7:48 PM, Hef <hef.online@gmail.com> wrote:
> >
> > > Hi guys,
> > > Thanks for all your hints.
> > > Let me summarize the tuning I have done these days.
> > > Initially, before tuning, HBase cluster worked at an average write tps
> of
> > > 400k tps (600k tps at max). The total network TX throughputs from
> > > clients(aggregated from multiple servers) to RegionServers  shown
> 300Mb/s
> > > in average.
> > >
> > > I adopted the following steps for tuning:
> > > 1. optimized the HBase schema for our table, deducted the cells size by
> > > 40%.
> > >     Result:
> > >     failed,  tps not obviously increased
> > >
> > > 2. Recreated the table by more evenly distribution of pre-split
> keyspace
> > >     Result:
> > >     failed, tps not obviously increased
> > >
> > > 3. Adjusted RS GC strategy:
> > >     Before:
> > >         -XX:+UseParNewGC
> > >         -XX:+UseConcMarkSweepGC
> > >         -XX:CMSInitiatingOccupancyFraction=70
> > >         -XX:+CMSParallelRemarkEnabled
> > >         -Xmx100g
> > >         -Xms100g
> > >         -Xmn20g
> > >
> > >     After:
> > >         -XX:+UseG1GC
> > >         -XX:+UnlockExperimentalVMOptions
> > >         -XX:MaxGCPauseMillis=50
> > >         -XX:-OmitStackTraceInFastThrow
> > >         -XX:ParallelGCThreads=18
> > >         -XX:+ParallelRefProcEnabled
> > >         -XX:+PerfDisableSharedMem
> > >         -XX:-ResizePLAB
> > >         -XX:G1NewSizePercent=8
> > >         -Xms100G -Xmx100G
> > >         -XX:MaxTenuringThreshold=1
> > >         -XX:G1HeapWastePercent=10
> > >         -XX:G1MixedGCCountTarget=16
> > >         -XX:G1HeapRegionSize=32M
> > >
> > >     Result:
> > >     Success. GC pause time reduced, tps increased by at least 10%
> > >
> > > 4. Upgraded to CDH5.9.1 HBase 1.2, also updated client lib to HBase1.2
> > >     Success:
> > >     1. total client TX  throughput raised to 700Mb/s
> > >     2. HBase write tps raised to 600k/s in average and 800k/s at max
> > >
> > > 5. Other configurations:
> > >     hbase.hstore.compactionThreshold = 10
> > >     hbase.hstore.blockingStoreFiles = 300
> > >     hbase.hstore.compaction.max = 20
> > >     hbase.regionserver.thread.compaction.small = 30
> > >
> > >     hbase.hregion.memstore.flush.size = 128
> > >     hbase.regionserver.global.memstore.lowerLimit = 0.3
> > >     hbase.regionserver.global.memstore.upperLimit = 0.7
> > >
> > >     hbase.regionserver.maxlogs = 100
> > >     hbase.wal.regiongrouping.numgroups = 5
> > >     hbase.wal.provider = Multiple HDFS WAL
> > >
> > >
> > >
> > > Summary:
> > >     1. HBase 1.2 does have better performance than 1.0
> > >     2. 300k/s tps per RegionServer still looks not satisfied, as I can
> > see
> > > the CPU/network/IO/memory  still have a lot idle resources.
> > >         Per RS:
> > >         1. CPU 50% used (Not sure why cpu is so high for only 300K
> writer
> > > requests)
> > >         2. JVM Heap, 40% used
> > >         3. total disks throughput over 12 HDDs, 91MB/s on write and
> > 40MB/s
> > > on read
> > >         4. Network in/out 560Mb/s on 1G NIC
> > >
> > >
> > > Further questions:
> > > Does anyone confront a similiar heavy write scenario like this?
> > > How much concurrent writes can a RegionServer handle?  Can any one
> share
> > > how much tps can your RS reach at max?
> > >
> > > Thanks
> > > Hef
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sat, Mar 18, 2017 at 1:11 PM, Yu Li <carp84@gmail.com> wrote:
> > >
> > >> First please try out stack's suggestion, all good ones.
> > >>
> > >> And some supplement: since all disks in use are HDD w/ normal IO
> > >> capability, it's important to control big IO rate like flush and
> > >> compaction. Try below features out:
> > >> 1. HBASE-8329 <https://issues.apache.org/jira/browse/HBASE-8329>:
> Limit
> > >> compaction speed (available in 1.1.0+)
> > >> 2. HBASE-14969 <https://issues.apache.org/jira/browse/HBASE-14969>:
> Add
> > >> throughput controller for flush (available in 1.3.0)
> > >> 3. HBASE-10201 <https://issues.apache.org/jira/browse/HBASE-10201>:
> Per
> > >> column family flush (available in 1.1.0+)
> > >>     * HBASE-14906 <https://issues.apache.org/jira/browse/HBASE-14906
> >:
> > >> Improvements on FlushLargeStoresPolicy (only available in 2.0, not
> > >> released
> > >> yet)
> > >>
> > >> Also try out multiple WAL, we observed ~20% write perf boost in prod.
> > See
> > >> more details in the doc attached in below JIRA:
> > >> - HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457>:
> > >> Umbrella:
> > >> Improve Multiple WAL for production usage
> > >>
> > >> And please note that if you decided to pick up a branch-1.1 release,
> > make
> > >> sure to use 1.1.3+, or you may hit some perf regression issue on
> writes,
> > >> see HBASE-14460 <https://issues.apache.org/jira/browse/HBASE-14460>
> for
> > >> more details.
> > >>
> > >> Hope these information helps.
> > >>
> > >> Best Regards,
> > >> Yu
> > >>
> > >> On 18 March 2017 at 05:51, Vladimir Rodionov <vladrodionov@gmail.com>
> > >> wrote:
> > >>
> > >> > >> In my opinion,  1M/s input data will result in only  70MByte/s
> > write
> > >> >
> > >> > Times 3 (default HDFS replication factor) Plus ...
> > >> >
> > >> > Do not forget about compaction read/write amplification. If you
> flush
> > >> 10 MB
> > >> > and your max region size is 10 GB, with default min file to compact
> > (3)
> > >> > your amplification is 6-7 That gives us 70 x 3 x 6 = 1260 MB/s
> > >> read/write
> > >> > or 210 MB/sec read and writes (210 MB/s reads and 210 MB/sec writes)
> > >> >
> > >> > per RS
> > >> >
> > >> > This IO load is way above sustainable.
> > >> >
> > >> >
> > >> > -Vlad
> > >> >
> > >> >
> > >> > On Fri, Mar 17, 2017 at 2:14 PM, Kevin O'Dell <kevin@rocana.com>
> > wrote:
> > >> >
> > >> > > Hey Hef,
> > >> > >
> > >> > >   What is the memstore size setting(how much heap is it allowed)
> > that
> > >> you
> > >> > > have on that cluster?  What is your region count per node?  Are
> you
> > >> > writing
> > >> > > evenly across all those regions or are only a few regions active
> per
> > >> > region
> > >> > > server at a time?  Can you paste your GC settings that you are
> > >> currently
> > >> > > using?
> > >> > >
> > >> > > On Fri, Mar 17, 2017 at 3:30 PM, Stack <stack@duboce.net> wrote:
> > >> > >
> > >> > > > On Fri, Mar 17, 2017 at 9:31 AM, Hef <hef.online@gmail.com>
> > wrote:
> > >> > > >
> > >> > > > > Hi group,
> > >> > > > > I'm using HBase to store large amount of time series data, the
> > >> usage
> > >> > > case
> > >> > > > > is heavy on writes then reads. My application stops at writing
> > >> 600k
> > >> > > > > requests per second and I can't tune up for better tps.
> > >> > > > >
> > >> > > > > Hardware:
> > >> > > > > I have 6 Region Servers, each has 128G memory, 12 HDDs, 2cores
> > >> with
> > >> > > > > 24threads,
> > >> > > > >
> > >> > > > > Schema:
> > >> > > > > The schema for these time series data is similar as OpenTSDB
> > that
> > >> the
> > >> > > > data
> > >> > > > > points of a same metric within an hour are store in one row,
> and
> > >> > there
> > >> > > > > could be maximum 3600 columns per row.
> > >> > > > > The cell is about 70bytes on its size, including the rowkey,
> > >> column
> > >> > > > > qualifier, column family and value.
> > >> > > > >
> > >> > > > > HBase config:
> > >> > > > > CDH 5.6 HBase 1.0.0
> > >> > > > >
> > >> > > >
> > >> > > > Can you upgrade? There's a big diff between 1.2 and 1.0.
> > >> > > >
> > >> > > >
> > >> > > > > 100G memory for each RegionServer
> > >> > > > > hbase.hstore.compactionThreshold = 50
> > >> > > > > hbase.hstore.blockingStoreFiles = 100
> > >> > > > > hbase.hregion.majorcompaction disable
> > >> > > > > hbase.client.write.buffer = 20MB
> > >> > > > > hbase.regionserver.handler.count = 100
> > >> > > > >
> > >> > > >
> > >> > > > Could try halving the handler count.
> > >> > > >
> > >> > > >
> > >> > > > > hbase.hregion.memstore.flush.size = 128MB
> > >> > > > >
> > >> > > > >
> > >> > > > > Why are you flushing? If it is because you are hitting this
> > flush
> > >> > > limit,
> > >> > > > can you try upping it?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > > HBase Client:
> > >> > > > > write in BufferedMutator with 100000/batch
> > >> > > > >
> > >> > > > > Inputs Volumes:
> > >> > > > > The input data throughput is more than 2millions/sec from
> Kafka
> > >> > > > >
> > >> > > > >
> > >> > > > How is the distribution? Evenly over the keyspace?
> > >> > > >
> > >> > > >
> > >> > > > > My writer applications are distributed, how ever I scaled them
> > up,
> > >> > the
> > >> > > > > total write throughput won't get larger than 600K/sec.
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > Tell us more about this scaling up? How many writers?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > > The severs have 20% CPU usage and 5.6 wa,
> > >> > > > >
> > >> > > >
> > >> > > > 5.6 is high enough. Is the i/o spread over the disks?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > > GC  doesn't look good though, it shows a lot 10s+.
> > >> > > > >
> > >> > > > >
> > >> > > > What settings do you have?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > > In my opinion,  1M/s input data will result in only  70MByte/s
> > >> write
> > >> > > > > throughput to the cluster, which is quite a small amount
> compare
> > >> to
> > >> > > the 6
> > >> > > > > region servers. The performance should not be bad like this.
> > >> > > > >
> > >> > > > > Is anybody has idea why the performance stops at 600K/s?
> > >> > > > > Is there anything I have to tune to increase the HBase write
> > >> > > throughput?
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > If you double the clients writing do you see an up in the
> > >> throughput?
> > >> > > >
> > >> > > > If you thread dump the servers, can you tell where they are held
> > >> up? Or
> > >> > > if
> > >> > > > they are doing any work at all relative?
> > >> > > >
> > >> > > > St.Ack
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Kevin O'Dell
> > >> > > Field Engineer
> > >> > > 850-496-1298 <(850)%20496-1298> | Kevin@rocana.com
> > >> > > @kevinrodell
> > >> > > <http://www.rocana.com>
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

--94eb2c19109c911127054b483946--