Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A9A590B5 for ; Sat, 25 Feb 2012 23:38:20 +0000 (UTC) Received: (qmail 53517 invoked by uid 500); 25 Feb 2012 23:38:19 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 53463 invoked by uid 500); 25 Feb 2012 23:38:19 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 53454 invoked by uid 99); 25 Feb 2012 23:38:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2012 23:38:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.210.41 as permitted sender) Received: from [209.85.210.41] (HELO mail-pz0-f41.google.com) (209.85.210.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2012 23:38:11 +0000 Received: by dadv6 with SMTP id v6so4605227dad.14 for ; Sat, 25 Feb 2012 15:37:49 -0800 (PST) Received-SPF: pass (google.com: domain of yuzhihong@gmail.com designates 10.68.73.103 as permitted sender) client-ip=10.68.73.103; Authentication-Results: mr.google.com; spf=pass (google.com: domain of yuzhihong@gmail.com designates 10.68.73.103 as permitted sender) smtp.mail=yuzhihong@gmail.com; dkim=pass header.i=yuzhihong@gmail.com Received: from mr.google.com ([10.68.73.103]) by 10.68.73.103 with SMTP id k7mr23654603pbv.132.1330213069977 (num_hops = 1); Sat, 25 Feb 2012 15:37:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=U6X20c68Imy/8O+Psbn8KvrYXGk2PJzSlKN0UJSzRUA=; b=p/kQseRP9UTABHJ3pvlq0bphLTkWZilejBQjyMb0Db7Lo9Qvk2+1yhG9dgyP3ekMsu fUD8zUl+53w4iUxHRpm/MxkRvzDZX6T0Ndm+X3GhmpjB5hYqaWpTDebDYrev2ie4f11u 3MGkSxGl4kElqmamvtybrHGunRDb1yLG9YL3Q= Received: by 10.68.73.103 with SMTP id k7mr19889962pbv.132.1330213069841; Sat, 25 Feb 2012 15:37:49 -0800 (PST) Received: from [10.239.177.212] (107.sub-174-254-224.myvzw.com. [174.254.224.107]) by mx.google.com with ESMTPS id p2sm8322687pbb.14.2012.02.25.15.37.46 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 25 Feb 2012 15:37:48 -0800 (PST) References: In-Reply-To: Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Message-Id: <9BC3048E-2F34-4338-A603-0595E4726D17@gmail.com> Cc: "dev@hbase.apache.org" X-Mailer: iPhone Mail (9A405) From: yuzhihong@gmail.com Subject: Re: Follow-up to my HBASE-4365 testing Date: Sat, 25 Feb 2012 15:37:43 -0800 To: "dev@hbase.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org Thanks for sharing this, Matt.=20 Do you mind opening a Jira for your suggestion ? On Feb 25, 2012, at 3:18 PM, Matt Corgan wrote: > I've been meaning to look into something regarding compactions for a while= > now that may be relevant here. It could be that this is already how it > works, but just to be sure I'll spell out my suspicions... >=20 > I did a lot of large uploads when we moved to .92. Our biggest dataset is= > time series data (partitioned 16 ways with a row prefix). The actual > inserting and flushing went extremely quickly, and the parallel compaction= s > were churning away. However, when the compactions inevitably started > falling behind I noticed a potential problem. The compaction queue would > get up to, say, 40, which represented, say, an hour's worth of requests. > The problem was that by the time a compaction request started executing, > the CompactionSelection that it held was terribly out of date. It was > compacting a small selection (3-5) of the 50 files that were now there. > Then the next request would compact another (3-5), etc, etc, until the > queue was empty. It would have been much better if a CompactionRequest > decided what files to compact when it got to the head of the queue. Then > it could see that there are now 50 files needing compacting and to possibl= y > compact the 30 smallest ones, not just 5. When the insertions were done > after many hours, I would have preferred it to do one giant major > compaction, but it sat there and worked through it's compaction queue > compacting all sorts of different combinations of files. >=20 > Said differently, it looks like .92 picks the files to compact at > compaction request time rather than compaction execution time which is > problematic when these times grow far apart. Is that the case? Maybe > there are some other effects that are mitigating it... >=20 > Matt >=20 > On Sat, Feb 25, 2012 at 10:05 AM, Jean-Daniel Cryans = wrote: >=20 >> Hey guys, >>=20 >> So in HBASE-4365 I ran multiple uploads and the latest one I reported >> was a 5TB import on 14 RS and it took 18h with Stack's patch. Now one >> thing we can see is that apart from some splitting, there's a lot of >> compacting going on. Stack was wondering exactly how much that IO >> costs us, so we devised a test where we could upload 5TB with 0 >> compactions. Here are the results: >>=20 >> The table was pre-split with 14 regions, 1 per region server. >> hbase.hstore.compactionThreshold=3D100 >> hbase.hstore.blockingStoreFiles=3D110 >> hbase.regionserver.maxlogs=3D64 (the block size is 128MB) >> hfile.block.cache.size=3D0.05 >> hbase.regionserver.global.memstore.lowerLimit=3D0.40 >> hbase.regionserver.global.memstore.upperLimit=3D0.74 >> export HBASE_REGIONSERVER_OPTS=3D"$HBASE_JMX_BASE -Xmx14G >> -XX:CMSInitiatingOccupancyFraction=3D75 -XX:NewSize=3D256m >> -XX:MaxNewSize=3D256m" >>=20 >> The table had: >> MAX_FILESIZE =3D> '549755813888', MEMSTORE_FLUSHSIZE =3D> '549755813888' >>=20 >> Basically what I'm trying to do is to never block and almost always be >> flushing. You'll probably notice the big difference between the lower >> and upper barriers and think "le hell?", it's because it takes so long >> to flush that you have to have enough room to take on more data while >> this is happening (and we are able to flush faster than we take on >> write). >>=20 >> The test reports the following: >> Wall time: 34984.083 s >> Aggregate Throughput: 156893.07 queries/s >> Aggregate Throughput: 160030935.29 bytes/s >>=20 >> That's 2x faster than when we wait for compactions and splits, not too >> bad but I'm pretty sure we can do better: >>=20 >> - The QPS was very uneven, it seems that when it's flushing it takes >> a big toll and queries drop to ~100k/s while the rest of the time it's >> more like 200k/s. Need to figure out what's going there and if it's >> really just caused by flush-related IO. >> - The logs were rolling every 6 seconds and since this takes a global >> write lock, I can see how we could be slowing down a lot across 14 >> machines. >> - The load was a bit uneven, I miscalculated my split points and the >> last region always had 2-3k more queries per second. >>=20 >> Stay tuned for more. >>=20 >> J-D >>=20