Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates
 209.85.210.41 as permitted sender)
Received-SPF: pass (google.com: domain of yuzhihong@gmail.com designates
 10.68.73.103 as permitted sender) client-ip=10.68.73.103;
References: 
 <CAGpTDNfSgZhrpob0GPRUt=gaSiDLdp+2OvX8OjwTwNCeaFvm6A@mail.gmail.com>
 <CAOKsKJU3-pohzp+KF7UFs6k0ijHnRTTE0ptOFjbZiAEbPM2HNw@mail.gmail.com>
In-Reply-To: 
 <CAOKsKJU3-pohzp+KF7UFs6k0ijHnRTTE0ptOFjbZiAEbPM2HNw@mail.gmail.com>
Mime-Version: 1.0 (1.0)
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii
Message-Id: <9BC3048E-2F34-4338-A603-0595E4726D17@gmail.com>
Cc: "dev@hbase.apache.org" <dev@hbase.apache.org>
From: yuzhihong@gmail.com
Subject: Re: Follow-up to my HBASE-4365 testing
Date: Sat, 25 Feb 2012 15:37:43 -0800
To: "dev@hbase.apache.org" <dev@hbase.apache.org>

Thanks for sharing this, Matt.=20

Do you mind opening a Jira for your suggestion ?


On Feb 25, 2012, at 3:18 PM, Matt Corgan <mcorgan@hotpads.com> wrote:

> I've been meaning to look into something regarding compactions for a while=

> now that may be relevant here.  It could be that this is already how it
> works, but just to be sure I'll spell out my suspicions...
>=20
> I did a lot of large uploads when we moved to .92.  Our biggest dataset is=

> time series data (partitioned 16 ways with a row prefix).  The actual
> inserting and flushing went extremely quickly, and the parallel compaction=
s
> were churning away.  However, when the compactions inevitably started
> falling behind I noticed a potential problem.  The compaction queue would
> get up to, say, 40, which represented, say, an hour's worth of requests.
> The problem was that by the time a compaction request started executing,
> the CompactionSelection that it held was terribly out of date.  It was
> compacting a small selection (3-5) of the 50 files that were now there.
> Then the next request would compact another (3-5), etc, etc, until the
> queue was empty.  It would have been much better if a CompactionRequest
> decided what files to compact when it got to the head of the queue.  Then
> it could see that there are now 50 files needing compacting and to possibl=
y
> compact the 30 smallest ones, not just 5.  When the insertions were done
> after many hours, I would have preferred it to do one giant major
> compaction, but it sat there and worked through it's compaction queue
> compacting all sorts of different combinations of files.
>=20
> Said differently, it looks like .92 picks the files to compact at
> compaction request time rather than compaction execution time which is
> problematic when these times grow far apart.  Is that the case?  Maybe
> there are some other effects that are mitigating it...
>=20
> Matt
>=20
> On Sat, Feb 25, 2012 at 10:05 AM, Jean-Daniel Cryans <jdcryans@apache.org>=
wrote:
>=20
>> Hey guys,
>>=20
>> So in HBASE-4365 I ran multiple uploads and the latest one I reported
>> was a 5TB import on 14 RS and it took 18h with Stack's patch. Now one
>> thing we can see is that apart from some splitting, there's a lot of
>> compacting going on. Stack was wondering exactly how much that IO
>> costs us, so we devised a test where we could upload 5TB with 0
>> compactions. Here are the results:
>>=20
>> The table was pre-split with 14 regions, 1 per region server.
>> hbase.hstore.compactionThreshold=3D100
>> hbase.hstore.blockingStoreFiles=3D110
>> hbase.regionserver.maxlogs=3D64  (the block size is 128MB)
>> hfile.block.cache.size=3D0.05
>> hbase.regionserver.global.memstore.lowerLimit=3D0.40
>> hbase.regionserver.global.memstore.upperLimit=3D0.74
>> export HBASE_REGIONSERVER_OPTS=3D"$HBASE_JMX_BASE -Xmx14G
>> -XX:CMSInitiatingOccupancyFraction=3D75 -XX:NewSize=3D256m
>> -XX:MaxNewSize=3D256m"
>>=20
>> The table had:
>> MAX_FILESIZE =3D> '549755813888', MEMSTORE_FLUSHSIZE =3D> '549755813888'
>>=20
>> Basically what I'm trying to do is to never block and almost always be
>> flushing. You'll probably notice the big difference between the lower
>> and upper barriers and think "le hell?", it's because it takes so long
>> to flush that you have to have enough room to take on more data while
>> this is happening (and we are able to flush faster than we take on
>> write).
>>=20
>> The test reports the following:
>> Wall time: 34984.083 s
>> Aggregate Throughput: 156893.07 queries/s
>> Aggregate Throughput: 160030935.29 bytes/s
>>=20
>> That's 2x faster than when we wait for compactions and splits, not too
>> bad but I'm pretty sure we can do better:
>>=20
>> - The QPS was very uneven, it seems that when it's flushing it takes
>> a big toll and queries drop to ~100k/s while the rest of the time it's
>> more like 200k/s. Need to figure out what's going there and if it's
>> really just caused by flush-related IO.
>> - The logs were rolling every 6 seconds and since this takes a global
>> write lock, I can see how we could be slowing down a lot across 14
>> machines.
>> - The load was a bit uneven, I miscalculated my split points and the
>> last region always had 2-3k more queries per second.
>>=20
>> Stay tuned for more.
>>=20
>> J-D
>>=20