incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: get_indexed_slices ~ simple map-reduce
Date Mon, 13 Jun 2011 23:27:40 GMT
From a quick read of the code in o.a.c.db.ColumnFamilyStore.scan()...

Candidate rows are first read by applying the most selected equality predicate. 

From those candidate rows...

1) If the SlicePredicate has a SliceRange the query execution will read all columns for the
candidate row  if the byte size of the largest tracked row is less than column_index_size_in_kb
config setting (defaults to 64K). Meaning if no more than 1 column index page of columns is
(probably) going to be read, they will all be read. 

2) Otherwise if the query will read the columns specified by the SliceRange. 

3) If the SlicePredicate uses a list of columns names, those columns and the ones referenced
in the IndexExpressions (except the one selected as the primary pivot above) are read from
disk. 

If additional columns are needed (in case 2 above) they are read in a separate reads from
the candidate row. 

Then when applying the SlicePredicate to produce the final projection into the result set,
all the columns required to satisfy the filter will be in memory.  


So, yes it reads just the columns from disk you you ask for. Unless it thinks it will take
no more work to read more. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13 Jun 2011, at 08:34, Michal August├Żn wrote:

> Hi,
> 
> as I wrote, I don't want to install Hadoop etc. - I want just to use
> the Thrift API. The core of my question is how does get_indexed_slices
> function work.
> 
> I know that it must get all keys using equality expression firstly -
> but what about additional expressions? Does Cassandra fetch whole
> filtered rows, or just columns used in additional filtering
> expression?
> 
> Thanks!
> 
> Augi
> 
> 2011/6/12 aaron morton <aaron@thelastpickle.com>:
>> Not exactly sure what you mean here, all data access is through the thrift
>> API unless you code java and embed cassandra in your app.
>> As well as Pig support there is also Hive support in brisk (which will also
>> have Pig support soon) http://www.datastax.com/products/brisk
>> Can you provide some more info on the use case ? Personally if you have a
>> read query you know you need to support, I would consider supporting it in
>> the data model without secondary indexes.
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 11 Jun 2011, at 19:23, Michal August├Żn wrote:
>> 
>> Hi all,
>> 
>> I'm thinking of get_indexed_slices function as a simple map-reduce job
>> (that just maps) - am I right?
>> 
>> Well, I would like to be able to run simple queries on values but I
>> don't want to install Hadoop, write map-reduce jobs in Java (the whole
>> application is in C# and I don't want to introduce new development
>> stack - maybe Pig would help) and have some second interface to
>> Cassandra (in addition to Thrift). So secondary indexes seem to be
>> rescue for me. I would have just one indexed column that will have
>> day-timestamp value (~100k items per day) and the equality expression
>> for this column would be in each query (and I would add more ad-hoc
>> expressions).
>> Will this scenario work or is there some issue I could run in?
>> 
>> Thanks!
>> 
>> Augi
>> 
>> 


Mime
View raw message