Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B689FBE7 for ; Wed, 1 May 2013 06:03:01 +0000 (UTC) Received: (qmail 17494 invoked by uid 500); 1 May 2013 06:02:59 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 17198 invoked by uid 500); 1 May 2013 06:02:57 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 17156 invoked by uid 99); 1 May 2013 06:02:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 06:02:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bryanck@gmail.com designates 209.85.192.174 as permitted sender) Received: from [209.85.192.174] (HELO mail-pd0-f174.google.com) (209.85.192.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 06:02:49 +0000 Received: by mail-pd0-f174.google.com with SMTP id y13so691149pdi.5 for ; Tue, 30 Apr 2013 23:02:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=8NrI2Rpqn5KpHVZiO3Srrs211Kb/Sn8W3EjHUgtIMw4=; b=OygS/qkeoJ0xBGeiWTBEVo9yylXBfwMxP9KEmQKKUWlKonNJTpsMr+uVvn+650R/lD uvaWqwJJLdUwtzkTc6NDXfKaziEyRV/Pgp7fgcmPA6eXLwkG8MaMuTbsj9iiMTTTSNfs GAWLaaPOubaP99NWsNVJmD82zByb7hrfZyJLBDKBkG6IeudxDXtlWjY0Xhv1oDwkQLAq rpZWMrX8BjHwFaIO9/W9LRXMYUhpXzvcMfow/mebob3PBaS2pZJzzW/HoHzLBS8ENO2G qf2I1NEAvj/ct6q2z6Hv52JlKkU4T82L7uSJ86X4bj6zXGZfTb6I4ddXpdK8tmz0ZwsY pxQg== X-Received: by 10.68.137.168 with SMTP id qj8mr2665126pbb.17.1367388147954; Tue, 30 Apr 2013 23:02:27 -0700 (PDT) Received: from [172.16.194.131] (c-69-181-100-95.hsd1.ca.comcast.net. [69.181.100.95]) by mx.google.com with ESMTPSA id ts3sm1714445pbc.12.2013.04.30.23.02.26 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 30 Apr 2013 23:02:27 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Poor HBase map-reduce scan performance From: Bryan Keller In-Reply-To: <1367384494.5120.YahooMailNeo@web140601.mail.bf1.yahoo.com> Date: Tue, 30 Apr 2013 23:02:25 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <20E5E82A-4A5F-4696-864B-E30C3B7B97CB@gmail.com> References: <992ED057-7C3F-4759-B1F4-5F166D549F18@gmail.com> <1367384494.5120.YahooMailNeo@web140601.mail.bf1.yahoo.com> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org The table has hashed keys so rows are evenly distributed amongst the = regionservers, and load on each regionserver is pretty much the same. I = also have per-table balancing turned on. I get mostly data local mappers = with only a few rack local (maybe 10 of the 250 mappers). Currently the table is a wide table schema, with lists of data = structures stored as columns with column prefixes grouping the data = structures (e.g. 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). = I was thinking of moving those data structures to protobuf which would = cut down on the number of columns. The downside is I can't filter on one = value with that, but it is a tradeoff I would make for performance. I = was also considering restructuring the table into a tall table. Something interesting is that my old regionserver machines had five 15k = SCSI drives instead of 2 SSDs, and performance was about the same. Also, = my old network was 1gbit, now it is 10gbit. So neither network nor disk = I/O appear to be the bottleneck. The CPU is rather high for the = regionserver so it seems like the best candidate to investigate. I will = try profiling it tomorrow and will report back. I may revisit = compression on vs off since that is adding load to the CPU. I'll also come up with a sample program that generates data similar to = my table. On Apr 30, 2013, at 10:01 PM, lars hofhansl wrote: > Your average row is 35k so scanner caching would not make a huge = difference, although I would have expected some improvements by setting = it to 10 or 50 since you have a wide 10ge pipe. >=20 > I assume your table is split sufficiently to touch all RegionServer... = Do you see the same load/IO on all region servers? >=20 > A bunch of scan improvements went into HBase since 0.94.2. > I blogged about some of these changes here: = http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html >=20 > In your case - since you have many columns, each of which carry the = rowkey - you might benefit a lot from HBASE-7279. >=20 > In the end HBase *is* slower than straight HDFS for full scans. How = could it not be? > So I would start by looking at HDFS first. Make sure Nagle's is = disbaled in both HBase and HDFS. >=20 > And lastly SSDs are somewhat new territory for HBase. Maybe Andy = Purtell is listening, I think he did some tests with HBase on SSDs. > With rotating media you typically see an improvement with compression. = With SSDs the added CPU needed for decompression might outweigh the = benefits. >=20 > At the risk of starting a larger discussion here, I would posit that = HBase's LSM based design, which trades random IO with sequential IO, = might be a bit more questionable on SSDs. >=20 > If you can, it would be nice to run a profiler against one of the = RegionServers (or maybe do it with the single RS setup) and see where it = is bottlenecked. > (And if you send me a sample program to generate some data - not 700g, = though :) - I'll try to do a bit of profiling during the next days as my = day job permits, but I do not have any machines with SSDs). >=20 > -- Lars >=20 >=20 >=20 >=20 > ________________________________ > From: Bryan Keller > To: user@hbase.apache.org=20 > Sent: Tuesday, April 30, 2013 9:31 PM > Subject: Re: Poor HBase map-reduce scan performance >=20 >=20 > Yes, I have tried various settings for setCaching() and I have = setCacheBlocks(false) >=20 > On Apr 30, 2013, at 9:17 PM, Ted Yu wrote: >=20 >> =46rom http://hbase.apache.org/book.html#mapreduce.example : >>=20 >> scan.setCaching(500); // 1 is the default in Scan, which will >> be bad for MapReduce jobs >> scan.setCacheBlocks(false); // don't set to true for MR jobs >>=20 >> I guess you have used the above setting. >>=20 >> 0.94.x releases are compatible. Have you considered upgrading to, say >> 0.94.7 which was recently released ? >>=20 >> Cheers >>=20 >> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller = wrote: >>=20 >>> I have been attempting to speed up my HBase map-reduce scans for a = while >>> now. I have tried just about everything without much luck. I'm = running out >>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 = and >>> Hadoop 2.0.0 (CDH4.2.1). >>>=20 >>> The table I'm scanning: >>> 20 mil rows >>> Hundreds of columns/row >>> Column keys can be 30-40 bytes >>> Column values are generally not large, 1k would be on the large side >>> 250 regions >>> Snappy compression >>> 8gb region size >>> 512mb memstore flush >>> 128k block size >>> 700gb of data on HDFS >>>=20 >>> My cluster has 8 datanodes which are also regionservers. Each has 8 = cores >>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a = separate >>> machine acting as namenode, HMaster, and zookeeper (single = instance). I >>> have disk local reads turned on. >>>=20 >>> I'm seeing around 5 gbit/sec on average network IO. Each disk is = getting >>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 =3D = 6.4gb/sec. >>>=20 >>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read = speed. Not >>> really that great compared to the theoretical I/O. However this is = far >>> better than I am seeing with HBase map-reduce scans of my table. >>>=20 >>> I have a simple no-op map-only job (using TableInputFormat) that = scans the >>> table and does nothing with data. This takes 45 minutes. That's = about >>> 260mb/sec read speed. This is over 5x slower than straight HDFS. >>> Basically, with HBase I'm seeing read performance of my 16 SSD = cluster >>> performing nearly 35% slower than a single SSD. >>>=20 >>> Here are some things I have changed to no avail: >>> Scan caching values >>> HDFS block sizes >>> HBase block sizes >>> Region file sizes >>> Memory settings >>> GC settings >>> Number of mappers/node >>> Compressed vs not compressed >>>=20 >>> One thing I notice is that the regionserver is using quite a bit of = CPU >>> during the map reduce job. When dumping the jstack of the process, = it seems >>> like it is usually in some type of memory allocation or = decompression >>> routine which didn't seem abnormal. >>>=20 >>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver = is >>> high but not maxed out. Disk I/O and network I/O are low, IO wait is = low. >>> I'm on the verge of just writing the dataset out to sequence files = once a >>> day for scan purposes. Is that what others are doing?