incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eldad Yamin <elda...@gmail.com>
Subject Re: Cassandra Secondary index/Twissandra
Date Mon, 11 Jul 2011 21:25:40 GMT
Hi Aaron,
Thank you again for your response.

I've read the article but I didn't understand everything. it would be great
if the benchmark will include the actual CLI/Python comments (that way it
will be easier to understand the query). in addition, an explanation about
row pages - what is it?.

Anyway, for a scale proportion, we can take as example
the average Facebook/Twitter user which can get 100K columns per user
(Userline).
So what is needed is to take the first 50 columns (order by TimeUUID), then
column 51 to 100, 101 to 150 etc.
Any suggestion on fast will it be? or how you recommend on configuring
Cassandra? or even a different way of achieving that goal?

Thanks,
Eldad.

On Sun, Jul 10, 2011 at 8:31 PM, aaron morton <aaron@thelastpickle.com>wrote:

> Can you recommend on a better way of doing that or a way to tune Cassandra
> to support those 2 CF?
>
> A select with no start or finish column name, a column count and not in
> reversed order is about the fastest read query.
>
> You will need to do a reversed query, which will be a little slower. But
> may still be plenty fast enough, depending on scale and throughput and all
> those other things. see
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 00:14, Eldad Yamin wrote:
>
> Aaron - Thank you for the fast response!
>
>
>    1. Does performance decrease (significantly) if the uniqueness of the
>    column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
>    lots of columns?
>
> >Depends on what sort of operations you are doing. Some read operations
> have to pay a constant cost to decode the row level column index, this can
> be tuned though. AFAIK the comparator type has very little to do with the
> performance.
>
> In Twissandra, the columns are used as "alternative" index for the
> Userline/Timeline. therefore the operation I'm going to do is slice_range.
> I'm going to get (for example) the first 50  columns (using comparator of
> TimeUUID/LONG).
> Can you recommend on a better way of doing that or a way to tune Cassandra
> to support those 2 CF?
>
>
> Thanks!
>
> On Sun, Jul 10, 2011 at 3:26 AM, aaron morton <aaron@thelastpickle.com>wrote:
>
>>
>>    1. Is there a limit on the number of columns in a single column family
>>    that serve as secondary indexes?
>>
>> AFAIK there is no coded limit, however every index is implemented as
>> another (hidden) Column Family that inherits the settings of the parent CF.
>> So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
>> Also, when an indexed column is updated there are potentially 3 operations
>> that have to happen: read the old value, delete the old value, write the new
>> value. More indexes == more index updating, just like any other database.
>>
>>
>>    1. Does performance decrease (significantly) if the uniqueness of the
>>    column’s values is high?
>>
>> Low cardinality is recommended
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html
>>
>>
>>    1. The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE"
>>    and not TimeUUID?
>>
>> Probably just to make the demo easier. It's used to order tweets in the
>> user and public timelines by the current time
>> https://github.com/twissandra/twissandra/blob/master/cass.py#L204
>>
>>
>>    1. Does performance decrease (significantly) if the uniqueness of the
>>    column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
>>    lots of columns?
>>
>> Depends on what sort of operations you are doing. Some read operations
>> have to pay a constant cost to decode the row level column index, this can
>> be tuned though. AFAIK the comparator type has very little to do with the
>> performance.
>>
>> Hope that helps.
>>
>> -----------------
>>  -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 9 Jul 2011, at 12:15, Eldad Yamin wrote:
>>
>> Hi,
>> I have few questions:
>>
>> *Secondary index*
>>
>>    1. Is there a limit on the number of columns in a single column family
>>    that serve as secondary indexes?
>>    2. Does performance decrease (significantly) if the uniqueness of the
>>    column’s values is high?
>>
>>
>> *Twissandra*
>>
>>    1. Why in the source (or any tutorial I've read):
>>    The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and
>>    not TimeUUID?
>>
>>    https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
>>    2. Does performance decrease (significantly) if the uniqueness of the
>>    column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
>>    lots of columns?
>>
>>
>> Thanks!
>> Eldad
>>
>>
>>
>
>

Mime
View raw message