cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <SEAN_R_DUR...@homedepot.com>
Subject RE: 1, 2, 3...
Date Mon, 11 Apr 2016 15:10:29 GMT
Cassandra is not good for table scan type queries (which count(*) typically is). While there
are some attempts to do that (as noted below), this is a path I avoid.


Sean Durity

From: Max C [mailto:mc_cassandra@core43.com]
Sent: Saturday, April 09, 2016 6:19 PM
To: user@cassandra.apache.org
Subject: Re: 1, 2, 3...

Looks like this guy (Brian Hess) wrote a script to split the token range and run count(*)
on each subrange:

https://github.com/brianmhess/cassandra-count

- Max

On Apr 8, 2016, at 10:56 pm, Jeff Jirsa <jeff.jirsa@crowdstrike.com<mailto:jeff.jirsa@crowdstrike.com>>
wrote:

SELECT COUNT(*) probably works (with internal paging) on many datasets with enough time and
assuming you don’t have any partitions that will kill you.

No, it doesn’t count extra replicas / duplicates.

The old way to do this (before paging / fetch size) was to use manual paging based on tokens/clustering
keys:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html – SELECT’s WHERE clause
can use token(), which is what you’d want to use to page through the whole token space.

You could, in theory, issue thousands of queries in parallel, all for different token ranges,
and then sum the results. That’s what something like spark would be doing. If you want to
determine rows per node, limit the token range to that owned by the node (easier with 1 token
than vnodes, with vnodes repeat num_tokens times).


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.
Mime
View raw message