cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tommaso barbugli <tbarbu...@gmail.com>
Subject Re: estimated row count for a pk range
Date Mon, 21 Jul 2014 15:13:48 GMT
thank you for the reply; I was hoping for something with a bit less
overhead than the first solution; the second is not really an option for me.

On Monday, 21 July 2014, DuyHai Doan <doanduyhai@gmail.com> wrote:

> 1) Use separate counter to count number of entries in each column family
> but it will require you to manage the counting manually
> 2) SELECT DISTINCT partitionKey FROM ....  Normally this query is
> optimized and is much faster than a SELECT *. However if you have a very
> big number of distinct partitions it can be slow
>
>
> On Sun, Jul 20, 2014 at 6:48 PM, tommaso barbugli <tbarbugli@gmail.com
> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>> wrote:
>
>> Hello,
>> Lately I collapsed several (around 1k) column families in a bunch (100)
>> of column families.
>> To keep data separated I have added an extra column (family) which is
>> part of the PK.
>>
>> While previous approach allowed me to always have a clear picture of
>> every column family's size; now I have no other option than select all the
>> rows and make some estimation to guess the overall size used by one of the
>> grouped data in this CFs.
>>
>> eg.
>> SELECT * FROM cf_shard1 WHERE family = '1';
>>
>> Of course this does not work really well when cf_shard1 has some data in
>> it; is there some way perhaps to get an estimated count for rows matching
>> this query?
>>
>> Thanks,
>> Tommaso
>>
>
>

-- 
sent from iphone (sorry for the typos)

Mime
View raw message