hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Schless <patrick.schl...@gmail.com>
Subject Re: Scanner Caching with wildly varying row widths
Date Mon, 04 Nov 2013 23:48:17 GMT
Sweet! Thanks for the tip :)


On Mon, Nov 4, 2013 at 5:10 PM, Dhaval Shah <prince_mithibai@yahoo.co.in>wrote:

> You can use scan.setBatch() to limit the number of columns returned.. Note
> that it will split up a row into multiple rows from a client's perspective
> and client code might need to be modified to make use of the setBatch
> feature
>
> Regards,
> Dhaval
>
>
> ________________________________
>  From: Patrick Schless <patrick.schless@gmail.com>
> To: user <user@hbase.apache.org>
> Sent: Monday, 4 November 2013 6:03 PM
> Subject: Scanner Caching with wildly varying row widths
>
>
> We have an application where a row can contain anywhere between 1 and
> 3600000 cells (there's only 1 column family). In practice, most rows have
> under 100 cells.
>
> Now we want to run some mapreduce jobs that touch every cell within a range
> (eg count how many cells we have).  With scanner caching set to something
> like 250, the job will chug along for a long time, until it hits a row with
> a lot of data, then it will die.  Setting the cache size down to 1 (row)
> would presumably work, but take forever to run.
>
> We have addressed this by writing some jobs that use coprocessors, which
> allow us to pull back sets of cells instead of sets of rows, but this means
> we can't use any of the built-in jobs that come with hbase (eg copyTable).
> Is there any way around this? Have other people had to deal with such high
> variability in their row sizes?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message