hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Poor HBase map-reduce scan performance
Date Thu, 23 May 2013 22:47:50 GMT
Thanks for the update, Sandy.

If you can open a JIRA and attach your producer / consumer scanner there,
that would be great.

On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <prattrs@adobe.com> wrote:

> I wrote myself a Scanner wrapper that uses a producer/consumer queue to
> keep the client fed with a full buffer as much as possible.  When scanning
> my table with scanner caching at 100 records, I see about a 24% uplift in
> performance (~35k records/sec with the ClientScanner and ~44k records/sec
> with my P/C scanner).  However, when I set scanner caching to 5000, it's
> more of a wash compared to the standard ClientScanner: ~53k records/sec
> with the ClientScanner and ~60k records/sec with the P/C scanner.
> I'm not sure what to make of those results.  I think next I'll shut down
> HBase and read the HFiles directly, to see if there's a drop off in
> performance between reading them directly vs. via the RegionServer.
> I still think that to really solve this there needs to be sliding window
> of records in flight between disk and RS, and between RS and client.  I'm
> thinking there's probably a single batch of records in flight between RS
> and client at the moment.
> Sandy
> On 5/23/13 8:45 AM, "Bryan Keller" <bryanck@gmail.com> wrote:
> >I am considering scanning a snapshot instead of the table. I believe this
> >is what the ExportSnapshot class does. If I could use the scanning code
> >from ExportSnapshot then I will be able to scan the HDFS files directly
> >and bypass the regionservers. This could potentially give me a huge boost
> >in performance for full table scans. However, it doesn't really address
> >the poor scan performance against a table.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message