hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Record limit in scan api?
Date Fri, 20 Nov 2009 21:45:55 GMT
You can set it on a per-HTable basis.  HTable.setScannerCaching(int);



On Fri, Nov 20, 2009 at 1:43 PM, Dave Latham <latham@davelink.net> wrote:
> I have some tables with large rows and some tables with very small rows, so
> I keep my default scanner caching at 1 row, but have to remember to set it
> higher when scanner tables with smaller rows.  It would be nice to have a
> default that did something reasonable across tables.
>
> Would it make sense to set scanner caching as a count of bytes rather than a
> count of rows?  That would make it similar to the write buffer for batches
> of puts that get flushed based on size rather than a fixed number of Puts.
> Then there could be some default value which should provide decent
> performance out of the box.
>
> Dave
>
> On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <ghelmling@gmail.com> wrote:
>
>> To set this per scan you should be able to do:
>>
>> Scan s = new Scan()
>> s.setCaching(...)
>>
>> (I think this works anyway)
>>
>>
>> The other thing that I've found useful is using a PageFilter on scans:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html
>>
>> I believe this is applied independently on each region server (?) so you
>> still need to do your own counting in iterating the results, but it can be
>> used to early out on the server side separately from the scanner caching
>> value.
>>
>> --gh
>>
>> On Fri, Nov 20, 2009 at 3:04 PM, stack <stack@duboce.net> wrote:
>>
>> > There is this in the configuration:
>> >
>> >  <property>
>> >    <name>hbase.client.scanner.caching</name>
>> >    <value>1</value>
>> >    <description>Number of rows that will be fetched when calling next
>> >    on a scanner if it is not served from memory. Higher caching values
>> >    will enable faster scanners but will eat up more memory and some
>> >    calls of next may take longer and longer times when the cache is
>> empty.
>> >    </description>
>> >  </property>
>> >
>> >
>> > Being able to do it per Scan sounds like something we should add.
>> >
>> > St.Ack
>> >
>> >
>> > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
>> > <silberst@yahoo-inc.com>wrote:
>> >
>> > >   Hi,
>> > > Is there a way to specify a limit on number of returned records for
>> scan?
>> > >  I
>> > > don¹t see any way to do this when building the scan.  If there is, that
>> > > would be great.  If not, what about when iterating over the result?  If
>> I
>> > > exit the loop when I reach my limit, will that approximate this clause?
>> > I
>> > > guess my real question is about how scan is implemented in the client.
>> > >  I.e.
>> > > How many records are returned from Hbase at a time as I iterate through
>> > the
>> > > scan result?  If I want 1,000 records and 100 get returned at a time,
>> > then
>> > > I¹m in good shape.  On the other hand, if I want 10 records and get 100
>> > at
>> > > a
>> > > time, it¹s a bit wasteful, though the waste is bounded.
>> > >
>> > > Thanks,
>> > > Adam
>> > >
>> >
>>
>

Mime
View raw message