hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: Record limit in scan api?
Date Sat, 21 Nov 2009 00:20:58 GMT
Thanks for your thoughts.  It's true you can configure the scan buffer rows
on an HTable or Scan instance, but I think there's something to be said to
try to work as well as we can out of the box.

It would be more complication, but not by much.  To track the idea and see
what it would look like, I made an issue and attached a proposed patch.

Dave

On Fri, Nov 20, 2009 at 1:55 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> And on the Scan as I wrote in my answer which is really really convenient.
>
> Not convinced on using bytes as a value for caching... It would be
> also more complicated.
>
> J-D
>
> On Fri, Nov 20, 2009 at 1:45 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> > You can set it on a per-HTable basis.  HTable.setScannerCaching(int);
> >
> >
> >
> > On Fri, Nov 20, 2009 at 1:43 PM, Dave Latham <latham@davelink.net>
> wrote:
> >> I have some tables with large rows and some tables with very small rows,
> so
> >> I keep my default scanner caching at 1 row, but have to remember to set
> it
> >> higher when scanner tables with smaller rows.  It would be nice to have
> a
> >> default that did something reasonable across tables.
> >>
> >> Would it make sense to set scanner caching as a count of bytes rather
> than a
> >> count of rows?  That would make it similar to the write buffer for
> batches
> >> of puts that get flushed based on size rather than a fixed number of
> Puts.
> >> Then there could be some default value which should provide decent
> >> performance out of the box.
> >>
> >> Dave
> >>
> >> On Fri, Nov 20, 2009 at 12:35 PM, Gary Helmling <ghelmling@gmail.com>
> wrote:
> >>
> >>> To set this per scan you should be able to do:
> >>>
> >>> Scan s = new Scan()
> >>> s.setCaching(...)
> >>>
> >>> (I think this works anyway)
> >>>
> >>>
> >>> The other thing that I've found useful is using a PageFilter on scans:
> >>>
> >>>
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html
> >>>
> >>> I believe this is applied independently on each region server (?) so
> you
> >>> still need to do your own counting in iterating the results, but it can
> be
> >>> used to early out on the server side separately from the scanner
> caching
> >>> value.
> >>>
> >>> --gh
> >>>
> >>> On Fri, Nov 20, 2009 at 3:04 PM, stack <stack@duboce.net> wrote:
> >>>
> >>> > There is this in the configuration:
> >>> >
> >>> >  <property>
> >>> >    <name>hbase.client.scanner.caching</name>
> >>> >    <value>1</value>
> >>> >    <description>Number of rows that will be fetched when calling
next
> >>> >    on a scanner if it is not served from memory. Higher caching
> values
> >>> >    will enable faster scanners but will eat up more memory and some
> >>> >    calls of next may take longer and longer times when the cache is
> >>> empty.
> >>> >    </description>
> >>> >  </property>
> >>> >
> >>> >
> >>> > Being able to do it per Scan sounds like something we should add.
> >>> >
> >>> > St.Ack
> >>> >
> >>> >
> >>> > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
> >>> > <silberst@yahoo-inc.com>wrote:
> >>> >
> >>> > >   Hi,
> >>> > > Is there a way to specify a limit on number of returned records
for
> >>> scan?
> >>> > >  I
> >>> > > don¹t see any way to do this when building the scan.  If there
is,
> that
> >>> > > would be great.  If not, what about when iterating over the result?
>  If
> >>> I
> >>> > > exit the loop when I reach my limit, will that approximate this
> clause?
> >>> > I
> >>> > > guess my real question is about how scan is implemented in the
> client.
> >>> > >  I.e.
> >>> > > How many records are returned from Hbase at a time as I iterate
> through
> >>> > the
> >>> > > scan result?  If I want 1,000 records and 100 get returned at
a
> time,
> >>> > then
> >>> > > I¹m in good shape.  On the other hand, if I want 10 records and
get
> 100
> >>> > at
> >>> > > a
> >>> > > time, it¹s a bit wasteful, though the waste is bounded.
> >>> > >
> >>> > > Thanks,
> >>> > > Adam
> >>> > >
> >>> >
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message