Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Date: Sun, 24 May 2015 18:42:32 +0000 (UTC)
From: lars hofhansl <larsh@apache.org>
Reply-To: lars hofhansl <larsh@apache.org>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Message-ID: <1476884352.651361.1432492952486.JavaMail.yahoo@mail.yahoo.com>
In-Reply-To: 
 <CABNOTEXO5Ucytn1jx=1ipySWyeZRwLQGvEKjM=yz=CnPjzxrPQ@mail.gmail.com>
References: 
 <CABNOTEXO5Ucytn1jx=1ipySWyeZRwLQGvEKjM=yz=CnPjzxrPQ@mail.gmail.com>
Subject: Re: Optimizing compactions on super-low-cost HW
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Yeah, all you can do is drive your write amplification down.


As Stack said:
- Increase hbase.hstore.compactionThreshold, and hbase.hstore.blockingStore=
Files. It'll hurt read, but in your case read is already significantly hurt=
 when compactions happen.


- Absolutely set hbase.hregion.majorcompaction to 1 week (with a jitter if =
1/2 week, that's the default in 0.98 and later). Minor compaction will stil=
l happen, based on the compactionThreshold setting. Right now you're rewrit=
ing _all_ you data _every_ day.


- Turning off WAL writing will safe you IO, but I doubt it'll help much. I =
do not expect async WAL helps a lot as the aggregate IO is still the same.

- See if you can enable DATA_BLOCK_ENCODING on your column families (FAST_D=
IFF, or PREFIX are good). You can also try SNAPPY compression. That would r=
educe you overall IO (Since your CPUs are also weak you'd have to test the =
CPU/IO tradeoff)


- If you have RAM to spare, increase the memstore flush size (will lead to =
initially larger and fewer files).


- Or (again if you have spare RAM) make your regions smaller, to curb write=
 amplification.


- I assume only the 300g partitions are mirrored, right? (not the entire 2t=
 drive)


I have some suggestions compiled here (if you don't mind the plug):=20
http://hadoop-hbase.blogspot.com/2015/05/my-hbasecon-talk-about-hbase.html

Other than that, I'll repeat what others said, you have 14 extremely weak m=
achines, you can't expect the world from this.
You're aggregate IOPS are less than 3000, you aggregate IO bandwidth ~3GB/s=
. Can you add more machines?


-- Lars

________________________________
From: Serega Sheypak <serega.sheypak@gmail.com>
To: user <user@hbase.apache.org>=20
Sent: Friday, May 22, 2015 3:45 AM
Subject: Re: Optimizing compactions on super-low-cost HW


We don't have money, these nodes are the cheapest. I totally agree that we
need 4-6 HDD, but there is no chance to get it unfortunately.
Okay, I'll try yo apply Stack suggestions.


2015-05-22 13:00 GMT+03:00 Michael Segel <michael_segel@hotmail.com>:

> Look, to be blunt, you=E2=80=99re screwed.
>
> If I read your cluster spec.. it sounds like you have a single i7 (quad
> core) cpu. That=E2=80=99s 4 cores or 8 threads.
>
> Mirroring the OS is common practice.
> Using the same drives for Hadoop=E2=80=A6 not so good, but once the sever=
 boots
> up=E2=80=A6 not so much I/O.
> Its not good, but you could live with it=E2=80=A6.
>
> Your best bet is to add a couple of more spindles. Ideally you=E2=80=99d =
want to
> have 6 drives. the 2 OS drives mirrored and separate. (Use the extra spac=
e
> to stash / write logs.) Then have 4 drives / spindles in JBOD for Hadoop.
> This brings you to a 1:1 on physical cores.  If your box can handle more
> spindles, then going to a total of 10 drives would improve performance
> further.
>
> However, you need to level set your expectations=E2=80=A6 you can only go=
 so far.
> If you have 4 drives spinning,  you could start to saturate a 1GbE networ=
k
> so that will hurt performance.
>
> That=E2=80=99s pretty much your only option in terms of fixing the hardwa=
re and
> then you have to start tuning.
>
> > On May 21, 2015, at 4:04 PM, Stack <stack@duboce.net> wrote:
> >
> > On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak <
> serega.sheypak@gmail.com>
> > wrote:
> >
> >>> Do you have the system sharing
> >> There are 2 HDD 7200 2TB each. There is 300GB OS partition on each dri=
ve
> >> with mirroring enabled. I can't persuade devops that mirroring could
> cause
> >> IO issues. What arguments can I bring? They use OS partition mirroring
> when
> >> disck fails, we can use other partition to boot OS and continue to
> work...
> >>
> >>
> > You are already compromised i/o-wise having two disks only. I have not
> the
> > experience to say for sure but basic physics would seem to dictate that
> > having your two disks (partially) mirrored compromises your i/o even
> more.
> >
> > You are in a bit of a hard place. Your operators want the machine to bo=
ot
> > even after it loses 50% of its disk.
> >
> >
> >>> Do you have to compact? In other words, do you have read SLAs?
> >> Unfortunately, I have mixed workload from web applications. I need to
> write
> >> and read and SLA is < 50ms.
> >>
> >>
> > Ok. You get the bit that seeks are about 10ms or each so with two disks
> you
> > can do 2x100 seeks a second presuming no one else is using disk.
> >
> >
> >>> How are your read times currently?
> >> Cloudera manager says it's 4K reads per second and 500 writes per seco=
nd
> >>
> >>> Does your working dataset fit in RAM or do
> >> reads have to go to disk?
> >> I have several tables for 500GB each and many small tables 10-20 GB.
> Small
> >> tables loaded hourly/daily using bulkload (prepare HFiles using MR and
> move
> >> them to HBase using utility). Big tables are used by webapps, they rea=
d
> and
> >> write them.
> >>
> >>
> > These hfiles are created on same cluster with MR? (i.e. they are using =
up
> > i/os)
> >
> >
> >>> It looks like you are running at about three storefiles per column
> family
> >> is it hbase.hstore.compactionThreshold=3D3?
> >>
> >
> >
> >>> What if you upped the threshold at which minors run?
> >> you mean bump  hbase.hstore.compactionThreshold to 8 or 10?
> >>
> >>
> > Yes.
> >
> > Downside is that your reads may require more seeks to find a keyvalue.
> >
> > Can you cache more?
> >
> > Can you make it so files are bigger before you flush?
> >
> >
> >
> >>> Do you have a downtime during which you could schedule compactions?
> >> Unfortunately no. It should work 24/7 and sometimes it doesn't do it.
> >>
> >>
> > So, it is running at full bore 24/7?  There is no 'downtime'... a time
> when
> > the traffic is not so heavy?
> >
> >
> >
> >>> Are you managing the major compactions yourself or are you having
> hbase do
> >> it for you?
> >> HBase, once a day hbase.hregion.majorcompaction=3D1day
> >>
> >>
> > Have you studied your compactions?  You realize that a major compaction
> > will do full rewrite of your dataset?  When they run, how many storefil=
es
> > are there?
> >
> > Do you have to run once a day?  Can you not run once a week?  Can you
> > manage the compactions yourself... and run them a region at a time in a
> > rolling manner across the cluster rather than have them just run whenev=
er
> > it suits them once a day?
> >
> >
> >
> >> I can disable WAL. It's ok to loose some data in case of RS failure. I=
'm
> >> not doing banking transactions.
> >> If I disable WAL, could it help?
> >>
> >>
> > It could but don't. Enable deferring sync'ing first if you can 'lose'
> some
> > data.
> >
> > Work on your flushing and compactions before you mess w/ WAL.
> >
> > What version of hbase are you on? You say CDH but the newer your hbase,
> the
> > better it does generally.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >> 2015-05-20 18:04 GMT+03:00 Stack <stack@duboce.net>:
> >>
> >>> On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak <
> >> serega.sheypak@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi, we are using extremely cheap HW:
> >>>> 2 HHD 7200
> >>>> 4*2 core (Hyperthreading)
> >>>> 32GB RAM
> >>>>
> >>>> We met serious IO performance issues.
> >>>> We have more or less even distribution of read/write requests. The
> same
> >>> for
> >>>> datasize.
> >>>>
> >>>> ServerName Request Per Second Read Request Count Write Request Count
> >>>> node01.domain.com,60020,1430172017193 195 171871826 16761699
> >>>> node02.domain.com,60020,1426925053570 24 34314930 16006603
> >>>> node03.domain.com,60020,1430860939797 22 32054801 16913299
> >>>> node04.domain.com,60020,1431975656065 33 1765121 253405
> >>>> node05.domain.com,60020,1430484646409 27 42248883 16406280
> >>>> node07.domain.com,60020,1426776403757 27 36324492 16299432
> >>>> node08.domain.com,60020,1426775898757 26 38507165 13582109
> >>>> node09.domain.com,60020,1430440612531 27 34360873 15080194
> >>>> node11.domain.com,60020,1431989669340 28 44307 13466
> >>>> node12.domain.com,60020,1431927604238 30 5318096 2020855
> >>>> node13.domain.com,60020,1431372874221 29 31764957 15843688
> >>>> node14.domain.com,60020,1429640630771 41 36300097 13049801
> >>>>
> >>>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed
> >>>> Storefile
> >>>> Size Index Size Bloom Size
> >>>> node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb 641849=
k
> >>>> 310111k
> >>>> node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb 649610=
k
> >>>> 318854k
> >>>> node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb 627346=
k
> >>>> 307136k
> >>>> node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb 655954=
k
> >>>> 289316k
> >>>> node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb 688136=
k
> >>>> 334127k
> >>>> node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb 631774=
k
> >>>> 296169k
> >>>> node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb 681486=
k
> >>>> 312325k
> >>>> node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb 658924=
k
> >>>> 309734k
> >>>> node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb 664753=
k
> >>>> 264081k
> >>>> node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb 652970=
k
> >>>> 304137k
> >>>> node13.domain.com,60020,1431372874221 82 178 937557m 70042mb 601684k
> >>>> 257607k
> >>>> node14.domain.com,60020,1429640630771 82 145 949090m 69749mb 592812k
> >>>> 266677k
> >>>>
> >>>>
> >>>> When compaction starts  random node gets I/O 100%, io wait for
> seconds,
> >>>> even tenth of seconds.
> >>>>
> >>>> What are the approaches to optimize minor and major compactions when
> >> you
> >>>> are I/O bound..?
> >>>>
> >>>
> >>> Yeah, with two disks, you will be crimped. Do you have the system
> sharing
> >>> with hbase/hdfs or is hdfs running on one disk only?
> >>>
> >>> Do you have to compact? In other words, do you have read SLAs?  How a=
re
> >>> your read times currently?  Does your working dataset fit in RAM or d=
o
> >>> reads have to go to disk?  It looks like you are running at about thr=
ee
> >>> storefiles per column family.  What if you upped the threshold at whi=
ch
> >>> minors run? Do you have a downtime during which you could schedule
> >>> compactions? Are you managing the major compactions yourself or are y=
ou
> >>> having hbase do it for you?
> >>>
> >>> St.Ack
> >>>
> >>
>
>