hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase Read Performance - Multiget vs TableInputFormat Job
Date Mon, 06 Feb 2012 17:28:39 GMT
On Mon, Feb 6, 2012 at 8:58 AM, Jon Bender <jonathan.bender@gmail.com> wrote:
> When you say it'll sort regions by you, does that mean I'll need to
> identify the regions before dividing up the maps?  Or just deal with the
> fact that multiple maps might read from the same regionserver?
>

If you do a multiget on N rows, internally HTable will sort the rows
by region so that the big multiget get turns into a as many
mini-multigets as there are regions present in the N rows.  HTable
then dispatches all in parallell and manages the returns, failures,
etc.

I was suggesting you run a client in the mapper and the map input
would be N rows for the client to handle.   Perhaps have each mapper
do 5 minutes worth of N multigets.

If in MR, your job gets distributed for you, retried (maybe you won't
want retries?), etc.
St.Ack

Mime
View raw message