incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Smith <dsm...@likewise.com>
Subject Re: Efficiency of hector's setRowCount (and setStartKey!)
Date Thu, 13 Oct 2011 16:39:33 GMT
It's actually setStartKey that's the important method call (in 
combination with setRowCount). So I should have been clearer.

The following code performs as expected, as far as returning the 
expected data in the expected order.  I believe that the use of 
IndexedSliceQuery's setStartKey will support efficient queries -- 
avoiding repulling the entire data set from cassandra. Correct?


         void demoPaging() {
                 String lastKey = processPage("don","");  // get first 
batch, starting with "" (smallest key)
                 lastKey = processPage("don",lastKey);    // get second 
batch starting with previous last key
                 lastKey = processPage("don",lastKey);    // get third 
batch starting with previous last key
                //....
         }

         // return last key processed, null when no records left
         String processPage(String username, String startKey) {
                 String lastKey=null;
                 IndexedSlicesQuery<String, String, String> 
indexedSlicesQuery =
                                 
HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, 
stringSerializer, stringSerializer);
                                 
indexedSlicesQuery.addEqualsExpression("user", username);
                                 
indexedSlicesQuery.setColumnNames("source","ip");
                                 
indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
                                 
indexedSlicesQuery.setStartKey(startKey);   // 
<----------------------------------------------------------------------------------------
                                 indexedSlicesQuery.setRowCount(batchSize);
                                 QueryResult<OrderedRows<String, String, 
String>> result =indexedSlicesQuery.execute();
                                 OrderedRows<String,String,String> rows 
= result.get();
                                 for(Row<String,String,String> row:rows ){
                                         if (row==null) { continue; }
                                         totalCount++;
                                         String key = row.getKey();

                                         if (!startKey.equals(key)) 
{lastKey=key;}
                                 }
                                 totalCount--;
                                 return lastKey;
         }






On 10/13/2011 09:15 AM, Patricio Echag├╝e wrote:
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount 
> of rows specified by RowCount and will go to the DB to get the new 
> page when needed.
>
> SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <dsmith@likewise.com 
> <mailto:dsmith@likewise.com>> wrote:
>
>     Hector's IndexedSlicesQuery has a setRowCount method that you can
>     use to page through the results, as described in
>     https://github.com/rantav/hector/wiki/User-Guide .
>
>         rangeSlicesQuery.setRowCount(1001);
>          .....
>         rangeSlicesQuery.setKeys(lastRow.getKey(),  "");
>
>     Is it efficient?  Specifically, suppose my query returns 100,000
>     results and I page through batches of 1000 at a time (making 100
>     executes of the query). Will it internally retrieve all the
>     results each time (but pass only the desired set of 1000 or so to
>     me)? Or will it optimize queries to avoid the duplication?      I
>     presume the latter. :)
>
>     Can IndexedSlicesQuery's setStartKey method be used for the same
>     effect?
>
>       Thanks,  Don
>
>


Mime
View raw message