hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Wolf <opus...@gmail.com>
Subject Re: Scanning the last N rows
Date Fri, 02 Mar 2012 21:59:52 GMT
Sorry, my code was a little off.  It should have been

         Scan scan = new Scan(calculateStartRowKey(targetAccount), 
calculateEndRowKey(targetAccount));

Where my row key is formed from <account><reverse timestamp>

So, the scanner would match all the rows for this account, and return 
them most recent first.

        Iterator<Result> it = scanner.iterator();

But if I stop doing this...

            Result result = it.next();

Will that be efficient?  Will the scanner potentially matching all rows 
for the account be a problem?

P


On 3/2/12 4:49 PM, Ian Varley wrote:
> Yes, you do have to worry about efficiency. If your rows aren't ordered in the table
(by rowkey) according to the update date, the server will be having to scan the entire table.
Your filter will enable it to not send all of those results to the client, but it's still
having to read them from disk and merge them with the rows in memory. It will likely not even
be possible for a big table (and, if it's not a *big* table, it probably shouldn't be in HBase).
>
> The fundamental thing to note here is that there's no "magic": HBase stores records sorted
in exactly one order; if what you want isn't able to be efficiently found according to that
ordering, then you'll be scanning the whole table. Relational DBs do that too, but they also
have indexes that let you get at things quickly in some other sort order.
>
> Ian
>
> On Mar 2, 2012, at 3:42 PM, Peter Wolf wrote:
>
>
> Ah ha!  So the row key orders the results, I just do an unbounded Scan,
> and stop after N iterations.
>
> Like this...
>
>         Scan scan = new Scan();
>         Filter filter = new SingleColumnValueFilter(...);
>         scan.setFilter(filter);
>         ResultScanner scanner = hTable.getScanner(scan);
>         Iterator<Result>  it = scanner.iterator();
>         for ( int i=0; i<1000&&  it.hasNext(); i++) {
>             Result result = it.next();
>             ... do stuff with result...
>         }
>
> Do I have to worry about efficiency?  Is the Server madly retrieving
> rows, in the background, that the Client will never use?
>
> Thanks
> P
>
>
>
> On 3/2/12 4:31 PM, Doug Meil wrote:
> Hi there-
>
> Take a look at this section of the book...
>
> http://hbase.apache.org/book.html#reverse.timestamp
>
>
>
>
> On 3/2/12 4:02 PM, "Peter Wolf"<opus111@gmail.com<mailto:opus111@gmail.com>>
  wrote:
>
> Hello all,
>
> I want to retrieve the most recent N rows from a table, with some column
> qualifiers.
>
> I can't find a Filter, or anything obvious in my books, or via Google.
>
> What is the idiom for doing this?
>
> Thanks
> Peter
>
>
>
>
>


Mime
View raw message