incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Augustýn <augustyn.mic...@gmail.com>
Subject Re: get_indexed_slices ~ simple map-reduce
Date Tue, 14 Jun 2011 08:25:31 GMT
Thank you!

I have one more question ;-) If I use regular "get" function then I
can be sure that it takes ~5ms. So I suppose that if I use
"get_indexed_slices" function then the response time depends on how
many rows match the most selected equality predicate. Am I right?

Augi

2011/6/14 aaron morton <aaron@thelastpickle.com>:
> From a quick read of the code in o.a.c.db.ColumnFamilyStore.scan()...
>
> Candidate rows are first read by applying the most selected equality predicate.
>
> From those candidate rows...
>
> 1) If the SlicePredicate has a SliceRange the query execution will read all columns for
the candidate row  if the byte size of the largest tracked row is less than column_index_size_in_kb
config setting (defaults to 64K). Meaning if no more than 1 column index page of columns is
(probably) going to be read, they will all be read.
>
> 2) Otherwise if the query will read the columns specified by the SliceRange.
>
> 3) If the SlicePredicate uses a list of columns names, those columns and the ones referenced
in the IndexExpressions (except the one selected as the primary pivot above) are read from
disk.
>
> If additional columns are needed (in case 2 above) they are read in a separate reads
from the candidate row.
>
> Then when applying the SlicePredicate to produce the final projection into the result
set, all the columns required to satisfy the filter will be in memory.
>
>
> So, yes it reads just the columns from disk you you ask for. Unless it thinks it will
take no more work to read more.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13 Jun 2011, at 08:34, Michal Augustýn wrote:
>
>> Hi,
>>
>> as I wrote, I don't want to install Hadoop etc. - I want just to use
>> the Thrift API. The core of my question is how does get_indexed_slices
>> function work.
>>
>> I know that it must get all keys using equality expression firstly -
>> but what about additional expressions? Does Cassandra fetch whole
>> filtered rows, or just columns used in additional filtering
>> expression?
>>
>> Thanks!
>>
>> Augi
>>
>> 2011/6/12 aaron morton <aaron@thelastpickle.com>:
>>> Not exactly sure what you mean here, all data access is through the thrift
>>> API unless you code java and embed cassandra in your app.
>>> As well as Pig support there is also Hive support in brisk (which will also
>>> have Pig support soon) http://www.datastax.com/products/brisk
>>> Can you provide some more info on the use case ? Personally if you have a
>>> read query you know you need to support, I would consider supporting it in
>>> the data model without secondary indexes.
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> On 11 Jun 2011, at 19:23, Michal Augustýn wrote:
>>>
>>> Hi all,
>>>
>>> I'm thinking of get_indexed_slices function as a simple map-reduce job
>>> (that just maps) - am I right?
>>>
>>> Well, I would like to be able to run simple queries on values but I
>>> don't want to install Hadoop, write map-reduce jobs in Java (the whole
>>> application is in C# and I don't want to introduce new development
>>> stack - maybe Pig would help) and have some second interface to
>>> Cassandra (in addition to Thrift). So secondary indexes seem to be
>>> rescue for me. I would have just one indexed column that will have
>>> day-timestamp value (~100k items per day) and the equality expression
>>> for this column would be in each query (and I would add more ad-hoc
>>> expressions).
>>> Will this scenario work or is there some issue I could run in?
>>>
>>> Thanks!
>>>
>>> Augi
>>>
>>>
>
>

Mime
View raw message