incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Mihaylov <n...@nmmm.nu>
Subject Re: How to find total number of rows in Cassandra databaase?
Date Mon, 22 Apr 2013 07:13:31 GMT
Hi

it is very important to know that counting rows is very very very expensive.
here is my 5 cents -

in one of my early projects we made separate column family, with just
single row.
we inserted each row key from other CF on this row as column key.

then once a day or who, we did get_count().

however because get_count() became way too slow,
we have split the keys on several rows - e.g. on 1024 rows.
it is still way too slow, but we do not need it to be realtime.

in our "second" project we decided to use cassandra counters.
however in order to be distinct, we need to read before write.
this degrade insert performance, so we did special CF with hashesh and
other stuff.
insert performance is still slow. 2 sec or something for 500-600 counters
(note single insert is OK, but we need to do 500-600 per batch, and 100-200
batches per second).

finally we have researched about probabilistic counters and we decided to
use these.
we also decided to make the project in Python, and we did not do proper
tests yes.

this is our final "take", it uses modified hyper log log, so we do not need
to read at all.

https://github.com/nmmmnu/CubicHyperLogLog

we tested the library very well, but not with real live data.
version for Redis is included too for easy testing.

Nikolay.





On Mon, Apr 22, 2013 at 2:19 AM, Utkarsh Sengar <utkarsh2012@gmail.com>wrote:

> Difference b/w cqlsh and cli is documented by the datastax guys here
> nicely: http://www.datastax.com/support-forums/topic/cli-vs-cql
>
> Thanks,
> -Utkarsh
>
>
> On Sun, Apr 21, 2013 at 1:39 PM, Techy Teck <comptechgeeky@gmail.com>wrote:
>
>> Yeah it helps a lot. I always have this doubt with me. What is the
>> difference between CLI and CQL?
>>
>>
>>
>> On Sun, Apr 21, 2013 at 1:30 PM, Utkarsh Sengar <utkarsh2012@gmail.com>wrote:
>>
>>> Using cqlsh you can do:
>>>
>>> SELECT COUNT(*) FROM columnfamily LIMIT 5000;
>>>
>>> Does that help?
>>>
>>> Read more: http://www.datastax.com/docs/1.0/references/cql/SELECT
>>>
>>> Thanks,
>>> -Utkarsh
>>>
>>>
>>>
>>> On Sun, Apr 21, 2013 at 1:04 PM, Techy Teck <comptechgeeky@gmail.com>wrote:
>>>
>>>> I have inserted 1000 rows in Cassandra database. Now I am trying to
>>>> find out how many rows have been inserted in Cassandra database using the
>>>> CLI mode.
>>>>
>>>>
>>>> In rdbms, I can do this sql-
>>>>
>>>> *       SELECT count(*) from TABLE;*
>>>>
>>>> And this will give me total count for that table;
>>>>
>>>> How to do the same thing in Cassandra database?
>>>>
>>>> I am running Cassandra 1.2.3
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> -Utkarsh
>>>
>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>

Mime
View raw message