cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricio Echagüe <patric...@gmail.com>
Subject Re: Efficiency of hector's setRowCount (and setStartKey!)
Date Thu, 13 Oct 2011 16:45:59 GMT
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith <dsmith@likewise.com> wrote:

> **
> It's actually setStartKey that's the important method call (in combination
> with setRowCount). So I should have been clearer.
>
> The following code performs as expected, as far as returning the expected
> data in the expected order.  I believe that the use of IndexedSliceQuery's
> setStartKey will support efficient queries -- avoiding repulling the entire
> data set from cassandra. Correct?
>

correct

>
>
>         void demoPaging() {
>                 String lastKey = processPage("don","");  // get first
> batch, starting with "" (smallest key)
>                 lastKey = processPage("don",lastKey);    // get second
> batch starting with previous last key
>                 lastKey = processPage("don",lastKey);    // get third batch
> starting with previous last key
>                //....
>         }
>
>         // return last key processed, null when no records left
>         String processPage(String username, String startKey) {
>                 String lastKey=null;
>                 IndexedSlicesQuery<String, String, String>
> indexedSlicesQuery =
>                                 HFactory.createIndexedSlicesQuery(keyspace,
> stringSerializer, stringSerializer, stringSerializer);
>
> indexedSlicesQuery.addEqualsExpression("user", username);
>
> indexedSlicesQuery.setColumnNames("source","ip");
>
> indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
>                                 indexedSlicesQuery.setStartKey(startKey);
> //
> <----------------------------------------------------------------------------------------
>                                 indexedSlicesQuery.setRowCount(batchSize);
>                                 QueryResult<OrderedRows<String, String,
> String>> result =indexedSlicesQuery.execute();
>                                 OrderedRows<String,String,String> rows =
> result.get();
>                                 for(Row<String,String,String> row:rows ){
>                                         if (row==null) { continue; }
>                                         totalCount++;
>                                         String key = row.getKey();
>
>                                         if (!startKey.equals(key))
> {lastKey=key;}
>                                 }
>                                 totalCount--;
>                                 return lastKey;
>         }
>
>
>
>
>
>
> On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
>
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount of
> rows specified by RowCount and will go to the DB to get the new page when
> needed.
>
>  SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <dsmith@likewise.com> wrote:
>
>> Hector's IndexedSlicesQuery has a setRowCount method that you can use to
>> page through the results, as described in
>> https://github.com/rantav/hector/wiki/User-Guide .
>>
>>     rangeSlicesQuery.setRowCount(1001);
>>      .....
>>     rangeSlicesQuery.setKeys(lastRow.getKey(),  "");
>>
>> Is it efficient?  Specifically, suppose my query returns 100,000 results
>> and I page through batches of 1000 at a time (making 100 executes of the
>> query). Will it internally retrieve all the results each time (but pass only
>> the desired set of 1000 or so to me)? Or will it optimize queries to avoid
>> the duplication?      I presume the latter. :)
>>
>> Can IndexedSlicesQuery's setStartKey method be used for the same effect?
>>
>>   Thanks,  Don
>>
>
>
>

Mime
View raw message