incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Віталій Тимчишин <tiv...@gmail.com>
Subject Re: Query on how to count the total number of rowkeys and columns in them
Date Thu, 24 May 2012 13:57:00 GMT
You should read multiple "batches" specifying last key received from
previous batch as first key for next one.
For large databases I'd recommend you to use statistical approach (if it's
feasible). With random parittioner it works well.
Don't read the whole db. Knowing whole keyspace you can read part, get
number of records per key (<1), then multiply by keyspace size and get your
total.
You can even implement an algorithm that will work until required precision
is obtained (simply after each batch compare you previous and current
estimate).
For me it's enough to read ~1% of DB to get good result.

Best regards, Vitalii Tymchyshyn

2012/5/24 Prakrati Agrawal <Prakrati.Agrawal@mu-sigma.com>

>  Hi****
>
> ** **
>
> I am trying to learn Cassandra and I have one doubt. I am using the Thrift
> API, to count the number of row keys I am using KeyRange to specify the row
> keys. To count all of them, I specify the start and end as “new byte[0]”.
> But the count is set to 100 by default. How do I use this method to count
> the keys if I don’t know the actual number of keys in my Cassandra
> database? Please help me****
>
>  **
>
-- 
Best regards,
 Vitalii Tymchyshyn

Mime
View raw message