hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean McNamara <Sean.McNam...@Webtrends.com>
Subject Parallel reading advice
Date Wed, 28 Nov 2012 06:28:00 GMT
I have a table who's keys are prefixed with a byte to help distribute the keys so scans don't
hotspot.

I also have a bunch of slave processes that work to scan the prefix partitions in parallel.
 Currently each slave sets up their own hbase connection, scanner, etc..  Most of the slave
processes finish their scan and return within 2-3 seconds.  It tends to take the same amount
of time regardless of if there's lots of data, or very little.  So I think that 2 sec overhead
is there because each slave will setup a new connection on each request (I am unable to reuse
connections in the slaves).

I'm wondering if I could remove some of that overhead by using the master (which can reuse
it's hbase connection) to determine the splits, and then delegating that information out to
each slave. I think I could possible use TableInputFormat/TableRecordReader to accomplish
this?  Would this route make sense?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message