hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: Poor HBase map-reduce scan performance
Date Wed, 01 May 2013 06:52:05 GMT
Not that it's a long-term solution, but try major-compacting before running
the benchmark.  If the LSM tree is CPU bound in merging HFiles/KeyValues
through the PriorityQueue, then reducing to a single file per region should
help.  The merging of HFiles during a scan is not heavily optimized yet.


On Tue, Apr 30, 2013 at 11:21 PM, lars hofhansl <larsh@apache.org> wrote:

> If you can, try 0.94.4+; it should significantly reduce the amount of
> bytes copied around in RAM during scanning, especially if you have wide
> rows and/or large key portions. That in turns makes scans scale better
> across cores, since RAM is shared resource between cores (much like disk).
>
>
> It's not hard to build the latest HBase against Cloudera's version of
> Hadoop. I can send along a simple patch to pom.xml to do that.
>
> -- Lars
>
>
>
> ________________________________
>  From: Bryan Keller <bryanck@gmail.com>
> To: user@hbase.apache.org
> Sent: Tuesday, April 30, 2013 11:02 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> The table has hashed keys so rows are evenly distributed amongst the
> regionservers, and load on each regionserver is pretty much the same. I
> also have per-table balancing turned on. I get mostly data local mappers
> with only a few rack local (maybe 10 of the 250 mappers).
>
> Currently the table is a wide table schema, with lists of data structures
> stored as columns with column prefixes grouping the data structures (e.g.
> 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of
> moving those data structures to protobuf which would cut down on the number
> of columns. The downside is I can't filter on one value with that, but it
> is a tradeoff I would make for performance. I was also considering
> restructuring the table into a tall table.
>
> Something interesting is that my old regionserver machines had five 15k
> SCSI drives instead of 2 SSDs, and performance was about the same. Also, my
> old network was 1gbit, now it is 10gbit. So neither network nor disk I/O
> appear to be the bottleneck. The CPU is rather high for the regionserver so
> it seems like the best candidate to investigate. I will try profiling it
> tomorrow and will report back. I may revisit compression on vs off since
> that is adding load to the CPU.
>
> I'll also come up with a sample program that generates data similar to my
> table.
>
>
> On Apr 30, 2013, at 10:01 PM, lars hofhansl <larsh@apache.org> wrote:
>
> > Your average row is 35k so scanner caching would not make a huge
> difference, although I would have expected some improvements by setting it
> to 10 or 50 since you have a wide 10ge pipe.
> >
> > I assume your table is split sufficiently to touch all RegionServer...
> Do you see the same load/IO on all region servers?
> >
> > A bunch of scan improvements went into HBase since 0.94.2.
> > I blogged about some of these changes here:
> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
> >
> > In your case - since you have many columns, each of which carry the
> rowkey - you might benefit a lot from HBASE-7279.
> >
> > In the end HBase *is* slower than straight HDFS for full scans. How
> could it not be?
> > So I would start by looking at HDFS first. Make sure Nagle's is disbaled
> in both HBase and HDFS.
> >
> > And lastly SSDs are somewhat new territory for HBase. Maybe Andy Purtell
> is listening, I think he did some tests with HBase on SSDs.
> > With rotating media you typically see an improvement with compression.
> With SSDs the added CPU needed for decompression might outweigh the
> benefits.
> >
> > At the risk of starting a larger discussion here, I would posit that
> HBase's LSM based design, which trades random IO with sequential IO, might
> be a bit more questionable on SSDs.
> >
> > If you can, it would be nice to run a profiler against one of the
> RegionServers (or maybe do it with the single RS setup) and see where it is
> bottlenecked.
> > (And if you send me a sample program to generate some data - not 700g,
> though :) - I'll try to do a bit of profiling during the next days as my
> day job permits, but I do not have any machines with SSDs).
> >
> > -- Lars
> >
> >
> >
> >
> > ________________________________
> > From: Bryan Keller <bryanck@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Tuesday, April 30, 2013 9:31 PM
> > Subject: Re: Poor HBase map-reduce scan performance
> >
> >
> > Yes, I have tried various settings for setCaching() and I have
> setCacheBlocks(false)
> >
> > On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> From http://hbase.apache.org/book.html#mapreduce.example :
> >>
> >> scan.setCaching(500);        // 1 is the default in Scan, which will
> >> be bad for MapReduce jobs
> >> scan.setCacheBlocks(false);  // don't set to true for MR jobs
> >>
> >> I guess you have used the above setting.
> >>
> >> 0.94.x releases are compatible. Have you considered upgrading to, say
> >> 0.94.7 which was recently released ?
> >>
> >> Cheers
> >>
> >> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gmail.com>
> wrote:
> >>
> >>> I have been attempting to speed up my HBase map-reduce scans for a
> while
> >>> now. I have tried just about everything without much luck. I'm running
> out
> >>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 and
> >>> Hadoop 2.0.0 (CDH4.2.1).
> >>>
> >>> The table I'm scanning:
> >>> 20 mil rows
> >>> Hundreds of columns/row
> >>> Column keys can be 30-40 bytes
> >>> Column values are generally not large, 1k would be on the large side
> >>> 250 regions
> >>> Snappy compression
> >>> 8gb region size
> >>> 512mb memstore flush
> >>> 128k block size
> >>> 700gb of data on HDFS
> >>>
> >>> My cluster has 8 datanodes which are also regionservers. Each has 8
> cores
> >>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a separate
> >>> machine acting as namenode, HMaster, and zookeeper (single instance). I
> >>> have disk local reads turned on.
> >>>
> >>> I'm seeing around 5 gbit/sec on average network IO. Each disk is
> getting
> >>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 =
> 6.4gb/sec.
> >>>
> >>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read speed.
> Not
> >>> really that great compared to the theoretical I/O. However this is far
> >>> better than I am seeing with HBase map-reduce scans of my table.
> >>>
> >>> I have a simple no-op map-only job (using TableInputFormat) that scans
> the
> >>> table and does nothing with data. This takes 45 minutes. That's about
> >>> 260mb/sec read speed. This is over 5x slower than straight HDFS.
> >>> Basically, with HBase I'm seeing read performance of my 16 SSD cluster
> >>> performing nearly 35% slower than a single SSD.
> >>>
> >>> Here are some things I have changed to no avail:
> >>> Scan caching values
> >>> HDFS block sizes
> >>> HBase block sizes
> >>> Region file sizes
> >>> Memory settings
> >>> GC settings
> >>> Number of mappers/node
> >>> Compressed vs not compressed
> >>>
> >>> One thing I notice is that the regionserver is using quite a bit of CPU
> >>> during the map reduce job. When dumping the jstack of the process, it
> seems
> >>> like it is usually in some type of memory allocation or decompression
> >>> routine which didn't seem abnormal.
> >>>
> >>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver is
> >>> high but not maxed out. Disk I/O and network I/O are low, IO wait is
> low.
> >>> I'm on the verge of just writing the dataset out to sequence files
> once a
> >>> day for scan purposes. Is that what others are doing?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message