Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of bryanck@gmail.com designates
 209.85.192.174 as permitted sender)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: Poor HBase map-reduce scan performance
From: Bryan Keller <bryanck@gmail.com>
In-Reply-To: <1367384494.5120.YahooMailNeo@web140601.mail.bf1.yahoo.com>
Date: Tue, 30 Apr 2013 23:02:25 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <20E5E82A-4A5F-4696-864B-E30C3B7B97CB@gmail.com>
References: <992ED057-7C3F-4759-B1F4-5F166D549F18@gmail.com>
 <CALte62yHLjUEmdpO6j+ehcUqc=Pqo2iOZc_q2A6iMhkZopooYQ@mail.gmail.com>
 <FDE42D5E-FCA5-4543-B178-0737FBC623AD@gmail.com>
 <1367384494.5120.YahooMailNeo@web140601.mail.bf1.yahoo.com>
To: user@hbase.apache.org

The table has hashed keys so rows are evenly distributed amongst the =
regionservers, and load on each regionserver is pretty much the same. I =
also have per-table balancing turned on. I get mostly data local mappers =
with only a few rack local (maybe 10 of the 250 mappers).

Currently the table is a wide table schema, with lists of data =
structures stored as columns with column prefixes grouping the data =
structures (e.g. 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). =
I was thinking of moving those data structures to protobuf which would =
cut down on the number of columns. The downside is I can't filter on one =
value with that, but it is a tradeoff I would make for performance. I =
was also considering restructuring the table into a tall table.

Something interesting is that my old regionserver machines had five 15k =
SCSI drives instead of 2 SSDs, and performance was about the same. Also, =
my old network was 1gbit, now it is 10gbit. So neither network nor disk =
I/O appear to be the bottleneck. The CPU is rather high for the =
regionserver so it seems like the best candidate to investigate. I will =
try profiling it tomorrow and will report back. I may revisit =
compression on vs off since that is adding load to the CPU.

I'll also come up with a sample program that generates data similar to =
my table.


On Apr 30, 2013, at 10:01 PM, lars hofhansl <larsh@apache.org> wrote:

> Your average row is 35k so scanner caching would not make a huge =
difference, although I would have expected some improvements by setting =
it to 10 or 50 since you have a wide 10ge pipe.
>=20
> I assume your table is split sufficiently to touch all RegionServer... =
Do you see the same load/IO on all region servers?
>=20
> A bunch of scan improvements went into HBase since 0.94.2.
> I blogged about some of these changes here: =
http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
>=20
> In your case - since you have many columns, each of which carry the =
rowkey - you might benefit a lot from HBASE-7279.
>=20
> In the end HBase *is* slower than straight HDFS for full scans. How =
could it not be?
> So I would start by looking at HDFS first. Make sure Nagle's is =
disbaled in both HBase and HDFS.
>=20
> And lastly SSDs are somewhat new territory for HBase. Maybe Andy =
Purtell is listening, I think he did some tests with HBase on SSDs.
> With rotating media you typically see an improvement with compression. =
With SSDs the added CPU needed for decompression might outweigh the =
benefits.
>=20
> At the risk of starting a larger discussion here, I would posit that =
HBase's LSM based design, which trades random IO with sequential IO, =
might be a bit more questionable on SSDs.
>=20
> If you can, it would be nice to run a profiler against one of the =
RegionServers (or maybe do it with the single RS setup) and see where it =
is bottlenecked.
> (And if you send me a sample program to generate some data - not 700g, =
though :) - I'll try to do a bit of profiling during the next days as my =
day job permits, but I do not have any machines with SSDs).
>=20
> -- Lars
>=20
>=20
>=20
>=20
> ________________________________
> From: Bryan Keller <bryanck@gmail.com>
> To: user@hbase.apache.org=20
> Sent: Tuesday, April 30, 2013 9:31 PM
> Subject: Re: Poor HBase map-reduce scan performance
>=20
>=20
> Yes, I have tried various settings for setCaching() and I have =
setCacheBlocks(false)
>=20
> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>=20
>> =46rom http://hbase.apache.org/book.html#mapreduce.example :
>>=20
>> scan.setCaching(500);        // 1 is the default in Scan, which will
>> be bad for MapReduce jobs
>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>=20
>> I guess you have used the above setting.
>>=20
>> 0.94.x releases are compatible. Have you considered upgrading to, say
>> 0.94.7 which was recently released ?
>>=20
>> Cheers
>>=20
>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gmail.com> =
wrote:
>>=20
>>> I have been attempting to speed up my HBase map-reduce scans for a =
while
>>> now. I have tried just about everything without much luck. I'm =
running out
>>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 =
and
>>> Hadoop 2.0.0 (CDH4.2.1).
>>>=20
>>> The table I'm scanning:
>>> 20 mil rows
>>> Hundreds of columns/row
>>> Column keys can be 30-40 bytes
>>> Column values are generally not large, 1k would be on the large side
>>> 250 regions
>>> Snappy compression
>>> 8gb region size
>>> 512mb memstore flush
>>> 128k block size
>>> 700gb of data on HDFS
>>>=20
>>> My cluster has 8 datanodes which are also regionservers. Each has 8 =
cores
>>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a =
separate
>>> machine acting as namenode, HMaster, and zookeeper (single =
instance). I
>>> have disk local reads turned on.
>>>=20
>>> I'm seeing around 5 gbit/sec on average network IO. Each disk is =
getting
>>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 =3D =
6.4gb/sec.
>>>=20
>>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read =
speed. Not
>>> really that great compared to the theoretical I/O. However this is =
far
>>> better than I am seeing with HBase map-reduce scans of my table.
>>>=20
>>> I have a simple no-op map-only job (using TableInputFormat) that =
scans the
>>> table and does nothing with data. This takes 45 minutes. That's =
about
>>> 260mb/sec read speed. This is over 5x slower than straight HDFS.
>>> Basically, with HBase I'm seeing read performance of my 16 SSD =
cluster
>>> performing nearly 35% slower than a single SSD.
>>>=20
>>> Here are some things I have changed to no avail:
>>> Scan caching values
>>> HDFS block sizes
>>> HBase block sizes
>>> Region file sizes
>>> Memory settings
>>> GC settings
>>> Number of mappers/node
>>> Compressed vs not compressed
>>>=20
>>> One thing I notice is that the regionserver is using quite a bit of =
CPU
>>> during the map reduce job. When dumping the jstack of the process, =
it seems
>>> like it is usually in some type of memory allocation or =
decompression
>>> routine which didn't seem abnormal.
>>>=20
>>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver =
is
>>> high but not maxed out. Disk I/O and network I/O are low, IO wait is =
low.
>>> I'm on the verge of just writing the dataset out to sequence files =
once a
>>> day for scan purposes. Is that what others are doing?