Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7093F17D6D for ; Sun, 24 May 2015 18:42:56 +0000 (UTC) Received: (qmail 34556 invoked by uid 500); 24 May 2015 18:42:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 34481 invoked by uid 500); 24 May 2015 18:42:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 34469 invoked by uid 99); 24 May 2015 18:42:52 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 May 2015 18:42:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7BFC61A30D2 for ; Sun, 24 May 2015 18:42:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, KAM_LIVE=1, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id z7y60wel8wo0 for ; Sun, 24 May 2015 18:42:41 +0000 (UTC) Received: from nm3-vm0.bullet.mail.bf1.yahoo.com (nm3-vm0.bullet.mail.bf1.yahoo.com [98.139.212.154]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 5978D20EA4 for ; Sun, 24 May 2015 18:42:41 +0000 (UTC) Received: from [66.196.81.172] by nm3.bullet.mail.bf1.yahoo.com with NNFMP; 24 May 2015 18:42:34 -0000 Received: from [98.139.212.197] by tm18.bullet.mail.bf1.yahoo.com with NNFMP; 24 May 2015 18:42:34 -0000 Received: from [127.0.0.1] by omp1006.mail.bf1.yahoo.com with NNFMP; 24 May 2015 18:42:34 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 88026.52422.bm@omp1006.mail.bf1.yahoo.com X-YMail-OSG: 3ZwYgdQVM1kcbCqKvz26nUhzlmB_h1OapwjjdwiGvW8i_o6N2QKZygC0gZn4IKw IGSBI3.In6mv7yYmVyyp1_qpfaRJZEHVgqnTlKF4ipGUQej8xJWIuScxmg82YuUavqYzXFQ4xK4B Nci.HYhp3Diava4J1onVdQWo5zt7zFmLtS5bNTSFO.NEre.4sN7YS0aZ7yQYOXumThLz6.qmL906 fgYG7qMnPz.xecZAr_5lHwk4yaDadOUtmaPVc5CMFlx_duszk.kwgSg0d8LViRkJUli_SzhlaPzI 0SCsuS3t0Yqm8KI_HmgAz0sjfDyW.6naC8vtArc35yIYlJ_0kkrF4UPKDTLMOJU3xl9IVLY1t.YU 6enJ5nFEtRNgFk.P.w.eZsCpraZkiabWCuaB2f5HtVBLezdQcYx2mvqxk80.M3bzzxZw.tHgBsBZ KgoQTScHwG6V36Q6sOGmDvKZYyxIZzhwx6_FajFlA7cvzEV83BLTWqO4aakSkXh42Jve4rx1tU2z apRV11w-- Received: by 66.196.80.150; Sun, 24 May 2015 18:42:33 +0000 Date: Sun, 24 May 2015 18:42:32 +0000 (UTC) From: lars hofhansl Reply-To: lars hofhansl To: "user@hbase.apache.org" Message-ID: <1476884352.651361.1432492952486.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: Optimizing compactions on super-low-cost HW MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yeah, all you can do is drive your write amplification down. As Stack said: - Increase hbase.hstore.compactionThreshold, and hbase.hstore.blockingStore= Files. It'll hurt read, but in your case read is already significantly hurt= when compactions happen. - Absolutely set hbase.hregion.majorcompaction to 1 week (with a jitter if = 1/2 week, that's the default in 0.98 and later). Minor compaction will stil= l happen, based on the compactionThreshold setting. Right now you're rewrit= ing _all_ you data _every_ day. - Turning off WAL writing will safe you IO, but I doubt it'll help much. I = do not expect async WAL helps a lot as the aggregate IO is still the same. - See if you can enable DATA_BLOCK_ENCODING on your column families (FAST_D= IFF, or PREFIX are good). You can also try SNAPPY compression. That would r= educe you overall IO (Since your CPUs are also weak you'd have to test the = CPU/IO tradeoff) - If you have RAM to spare, increase the memstore flush size (will lead to = initially larger and fewer files). - Or (again if you have spare RAM) make your regions smaller, to curb write= amplification. - I assume only the 300g partitions are mirrored, right? (not the entire 2t= drive) I have some suggestions compiled here (if you don't mind the plug):=20 http://hadoop-hbase.blogspot.com/2015/05/my-hbasecon-talk-about-hbase.html Other than that, I'll repeat what others said, you have 14 extremely weak m= achines, you can't expect the world from this. You're aggregate IOPS are less than 3000, you aggregate IO bandwidth ~3GB/s= . Can you add more machines? -- Lars ________________________________ From: Serega Sheypak To: user =20 Sent: Friday, May 22, 2015 3:45 AM Subject: Re: Optimizing compactions on super-low-cost HW We don't have money, these nodes are the cheapest. I totally agree that we need 4-6 HDD, but there is no chance to get it unfortunately. Okay, I'll try yo apply Stack suggestions. 2015-05-22 13:00 GMT+03:00 Michael Segel : > Look, to be blunt, you=E2=80=99re screwed. > > If I read your cluster spec.. it sounds like you have a single i7 (quad > core) cpu. That=E2=80=99s 4 cores or 8 threads. > > Mirroring the OS is common practice. > Using the same drives for Hadoop=E2=80=A6 not so good, but once the sever= boots > up=E2=80=A6 not so much I/O. > Its not good, but you could live with it=E2=80=A6. > > Your best bet is to add a couple of more spindles. Ideally you=E2=80=99d = want to > have 6 drives. the 2 OS drives mirrored and separate. (Use the extra spac= e > to stash / write logs.) Then have 4 drives / spindles in JBOD for Hadoop. > This brings you to a 1:1 on physical cores. If your box can handle more > spindles, then going to a total of 10 drives would improve performance > further. > > However, you need to level set your expectations=E2=80=A6 you can only go= so far. > If you have 4 drives spinning, you could start to saturate a 1GbE networ= k > so that will hurt performance. > > That=E2=80=99s pretty much your only option in terms of fixing the hardwa= re and > then you have to start tuning. > > > On May 21, 2015, at 4:04 PM, Stack wrote: > > > > On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak < > serega.sheypak@gmail.com> > > wrote: > > > >>> Do you have the system sharing > >> There are 2 HDD 7200 2TB each. There is 300GB OS partition on each dri= ve > >> with mirroring enabled. I can't persuade devops that mirroring could > cause > >> IO issues. What arguments can I bring? They use OS partition mirroring > when > >> disck fails, we can use other partition to boot OS and continue to > work... > >> > >> > > You are already compromised i/o-wise having two disks only. I have not > the > > experience to say for sure but basic physics would seem to dictate that > > having your two disks (partially) mirrored compromises your i/o even > more. > > > > You are in a bit of a hard place. Your operators want the machine to bo= ot > > even after it loses 50% of its disk. > > > > > >>> Do you have to compact? In other words, do you have read SLAs? > >> Unfortunately, I have mixed workload from web applications. I need to > write > >> and read and SLA is < 50ms. > >> > >> > > Ok. You get the bit that seeks are about 10ms or each so with two disks > you > > can do 2x100 seeks a second presuming no one else is using disk. > > > > > >>> How are your read times currently? > >> Cloudera manager says it's 4K reads per second and 500 writes per seco= nd > >> > >>> Does your working dataset fit in RAM or do > >> reads have to go to disk? > >> I have several tables for 500GB each and many small tables 10-20 GB. > Small > >> tables loaded hourly/daily using bulkload (prepare HFiles using MR and > move > >> them to HBase using utility). Big tables are used by webapps, they rea= d > and > >> write them. > >> > >> > > These hfiles are created on same cluster with MR? (i.e. they are using = up > > i/os) > > > > > >>> It looks like you are running at about three storefiles per column > family > >> is it hbase.hstore.compactionThreshold=3D3? > >> > > > > > >>> What if you upped the threshold at which minors run? > >> you mean bump hbase.hstore.compactionThreshold to 8 or 10? > >> > >> > > Yes. > > > > Downside is that your reads may require more seeks to find a keyvalue. > > > > Can you cache more? > > > > Can you make it so files are bigger before you flush? > > > > > > > >>> Do you have a downtime during which you could schedule compactions? > >> Unfortunately no. It should work 24/7 and sometimes it doesn't do it. > >> > >> > > So, it is running at full bore 24/7? There is no 'downtime'... a time > when > > the traffic is not so heavy? > > > > > > > >>> Are you managing the major compactions yourself or are you having > hbase do > >> it for you? > >> HBase, once a day hbase.hregion.majorcompaction=3D1day > >> > >> > > Have you studied your compactions? You realize that a major compaction > > will do full rewrite of your dataset? When they run, how many storefil= es > > are there? > > > > Do you have to run once a day? Can you not run once a week? Can you > > manage the compactions yourself... and run them a region at a time in a > > rolling manner across the cluster rather than have them just run whenev= er > > it suits them once a day? > > > > > > > >> I can disable WAL. It's ok to loose some data in case of RS failure. I= 'm > >> not doing banking transactions. > >> If I disable WAL, could it help? > >> > >> > > It could but don't. Enable deferring sync'ing first if you can 'lose' > some > > data. > > > > Work on your flushing and compactions before you mess w/ WAL. > > > > What version of hbase are you on? You say CDH but the newer your hbase, > the > > better it does generally. > > > > St.Ack > > > > > > > > > > > >> 2015-05-20 18:04 GMT+03:00 Stack : > >> > >>> On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak < > >> serega.sheypak@gmail.com> > >>> wrote: > >>> > >>>> Hi, we are using extremely cheap HW: > >>>> 2 HHD 7200 > >>>> 4*2 core (Hyperthreading) > >>>> 32GB RAM > >>>> > >>>> We met serious IO performance issues. > >>>> We have more or less even distribution of read/write requests. The > same > >>> for > >>>> datasize. > >>>> > >>>> ServerName Request Per Second Read Request Count Write Request Count > >>>> node01.domain.com,60020,1430172017193 195 171871826 16761699 > >>>> node02.domain.com,60020,1426925053570 24 34314930 16006603 > >>>> node03.domain.com,60020,1430860939797 22 32054801 16913299 > >>>> node04.domain.com,60020,1431975656065 33 1765121 253405 > >>>> node05.domain.com,60020,1430484646409 27 42248883 16406280 > >>>> node07.domain.com,60020,1426776403757 27 36324492 16299432 > >>>> node08.domain.com,60020,1426775898757 26 38507165 13582109 > >>>> node09.domain.com,60020,1430440612531 27 34360873 15080194 > >>>> node11.domain.com,60020,1431989669340 28 44307 13466 > >>>> node12.domain.com,60020,1431927604238 30 5318096 2020855 > >>>> node13.domain.com,60020,1431372874221 29 31764957 15843688 > >>>> node14.domain.com,60020,1429640630771 41 36300097 13049801 > >>>> > >>>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed > >>>> Storefile > >>>> Size Index Size Bloom Size > >>>> node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb 641849= k > >>>> 310111k > >>>> node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb 649610= k > >>>> 318854k > >>>> node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb 627346= k > >>>> 307136k > >>>> node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb 655954= k > >>>> 289316k > >>>> node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb 688136= k > >>>> 334127k > >>>> node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb 631774= k > >>>> 296169k > >>>> node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb 681486= k > >>>> 312325k > >>>> node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb 658924= k > >>>> 309734k > >>>> node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb 664753= k > >>>> 264081k > >>>> node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb 652970= k > >>>> 304137k > >>>> node13.domain.com,60020,1431372874221 82 178 937557m 70042mb 601684k > >>>> 257607k > >>>> node14.domain.com,60020,1429640630771 82 145 949090m 69749mb 592812k > >>>> 266677k > >>>> > >>>> > >>>> When compaction starts random node gets I/O 100%, io wait for > seconds, > >>>> even tenth of seconds. > >>>> > >>>> What are the approaches to optimize minor and major compactions when > >> you > >>>> are I/O bound..? > >>>> > >>> > >>> Yeah, with two disks, you will be crimped. Do you have the system > sharing > >>> with hbase/hdfs or is hdfs running on one disk only? > >>> > >>> Do you have to compact? In other words, do you have read SLAs? How a= re > >>> your read times currently? Does your working dataset fit in RAM or d= o > >>> reads have to go to disk? It looks like you are running at about thr= ee > >>> storefiles per column family. What if you upped the threshold at whi= ch > >>> minors run? Do you have a downtime during which you could schedule > >>> compactions? Are you managing the major compactions yourself or are y= ou > >>> having hbase do it for you? > >>> > >>> St.Ack > >>> > >> > >