hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Liochon <nkey...@gmail.com>
Subject Re: Poor HBase map-reduce scan performance
Date Thu, 02 May 2013 18:00:41 GMT
You can try Yourkit, they have evaluation licenses. There is one gotcha:
some classes are excluded by default, and this includes org.apache.* . So
you need to change the default config when using it with HBase.


On Thu, May 2, 2013 at 7:54 PM, Bryan Keller <bryanck@gmail.com> wrote:

> I ran one of my regionservers through VisualVM. It looks like the top hot
> spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate().
> It appears at first glance that memory allocations may be an issue.
> Decompression was next below that but less of an issue it seems.
>
> Would changing the block size, either HDFS or HBase, help here?
>
> Also, if anyone has tips on how else to profile, that would be
> appreciated. VisualVM can produce a lot of noise that is hard to sift
> through.
>
>
> On May 1, 2013, at 9:49 PM, Bryan Keller <bryanck@gmail.com> wrote:
>
> > I used exactly 0.94.4, pulled from the tag in subversion.
> >
> > On May 1, 2013, at 9:41 PM, lars hofhansl <larsh@apache.org> wrote:
> >
> >> Hmm... Did you actually use exactly version 0.94.4, or the latest
> 0.94.7.
> >> I would be very curious to see profiling data.
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: Bryan Keller <bryanck@gmail.com>
> >> To: "user@hbase.apache.org" <user@hbase.apache.org>
> >> Cc:
> >> Sent: Wednesday, May 1, 2013 6:01 PM
> >> Subject: Re: Poor HBase map-reduce scan performance
> >>
> >> I tried running my test with 0.94.4, unfortunately performance was
> about the same. I'm planning on profiling the regionserver and trying some
> other things tonight and tomorrow and will report back.
> >>
> >> On May 1, 2013, at 8:00 AM, Bryan Keller <bryanck@gmail.com> wrote:
> >>
> >>> Yes I would like to try this, if you can point me to the pom.xml patch
> that would save me some time.
> >>>
> >>> On Tuesday, April 30, 2013, lars hofhansl wrote:
> >>> If you can, try 0.94.4+; it should significantly reduce the amount of
> bytes copied around in RAM during scanning, especially if you have wide
> rows and/or large key portions. That in turns makes scans scale better
> across cores, since RAM is shared resource between cores (much like disk).
> >>>
> >>>
> >>> It's not hard to build the latest HBase against Cloudera's version of
> Hadoop. I can send along a simple patch to pom.xml to do that.
> >>>
> >>> -- Lars
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>  From: Bryan Keller <bryanck@gmail.com>
> >>> To: user@hbase.apache.org
> >>> Sent: Tuesday, April 30, 2013 11:02 PM
> >>> Subject: Re: Poor HBase map-reduce scan performance
> >>>
> >>>
> >>> The table has hashed keys so rows are evenly distributed amongst the
> regionservers, and load on each regionserver is pretty much the same. I
> also have per-table balancing turned on. I get mostly data local mappers
> with only a few rack local (maybe 10 of the 250 mappers).
> >>>
> >>> Currently the table is a wide table schema, with lists of data
> structures stored as columns with column prefixes grouping the data
> structures (e.g. 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I
> was thinking of moving those data structures to protobuf which would cut
> down on the number of columns. The downside is I can't filter on one value
> with that, but it is a tradeoff I would make for performance. I was also
> considering restructuring the table into a tall table.
> >>>
> >>> Something interesting is that my old regionserver machines had five
> 15k SCSI drives instead of 2 SSDs, and performance was about the same.
> Also, my old network was 1gbit, now it is 10gbit. So neither network nor
> disk I/O appear to be the bottleneck. The CPU is rather high for the
> regionserver so it seems like the best candidate to investigate. I will try
> profiling it tomorrow and will report back. I may revisit compression on vs
> off since that is adding load to the CPU.
> >>>
> >>> I'll also come up with a sample program that generates data similar to
> my table.
> >>>
> >>>
> >>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <larsh@apache.org> wrote:
> >>>
> >>>> Your average row is 35k so scanner caching would not make a huge
> difference, although I would have expected some improvements by setting it
> to 10 or 50 since you have a wide 10ge pipe.
> >>>>
> >>>> I assume your table is split sufficiently to touch all
> RegionServer... Do you see the same load/IO on all region servers?
> >>>>
> >>>> A bunch of scan improvements went into HBase since 0.94.2.
> >>>> I blogged about some of these changes here:
> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
> >>>>
> >>>> In your case - since you have many columns, each of which carry the
> rowkey - you might benefit a lot from HBASE-7279.
> >>>>
> >>>> In the end HBase *is* slower than straight HDFS for full scans. How
> could it not be?
> >>>> So I would start by looking at HDFS first. Make sure Nagle's is
> disbaled in both HBase and HDFS.
> >>>>
> >>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy
> Purtell is listening, I think he did some tests with HBase on SSDs.
> >>>> With rotating media you typically see an improvement with
> compression. With SSDs the added CPU needed for decompression might
> outweigh the benefits.
> >>>>
> >>>> At the risk of starting a larger discussion here, I would posit that
> HBase's LSM based design, which trades random IO with sequential IO, might
> be a bit more questionable on SSDs.
> >>>>
> >>>> If you can, it would be nice to run a profiler against one of the
> RegionServers (or maybe do it with the single RS setup) and see where it is
> bottlenecked.
> >>>> (And if you send me a sample program to generate some data - not
> 700g, though :) - I'll try to do a bit of profiling during the next days as
> my day job permits, but I do not have any machines with SSDs).
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>> From: Bryan Keller <bryanck@gmail.com>
> >>>> To: user@hbase.apache.org
> >>>> Sent: Tuesday, April 30, 2013 9:31 PM
> >>>> Subject: Re: Poor HBase map-reduce scan performance
> >>>>
> >>>>
> >>>> Yes, I have tried various settings for setCaching() and I have
> setCacheBlocks(false)
> >>>>
> >>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>>
> >>>>> From http://hbase.apache.org/book.html#mapreduce.example :
> >>>>>
> >>>>> scan.setCaching(500);        // 1 is the default in Scan, which
will
> >>>>> be bad for MapReduce jobs
> >>>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
> >>>>>
> >>>>> I guess you have used the above setting.
> >>>>>
> >>>>> 0.94.x releases are compatible. Have you considered upgrading to,
say
> >>>>> 0.94.7 which was recently released ?
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gm
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message