cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Lerer <>
Subject Re: Optimizing queries for partition keys
Date Thu, 22 Mar 2018 22:16:57 GMT
You should check the 3.x release. CASSANDRA-10657 could have fixed your

On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer <
> wrote:

> Syvlain explained the problem in CASSANDRA-4536:
> " Let me note that in CQL3 a row that have no live column don't exist, so
> we can't really implement this with a range slice having an empty columns
> list. Instead we should do a range slice with a full-row slice predicate
> with a count of 1, to make sure we do have a live column before including
> the partition key. "
> By using ColumnFilter.selectionBuilder(); you do not select all the
> columns. By consequence, some partitions might be returned while they
> should not.
> On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock <> wrote:
>> Cassandra devs,
>> We use workflows in some of our clusters (running 3.0.15) that involve
>> "SELECT DISTINCT key FROM..."-style queries.  For some tables, we
>> observed extremely poor performance under light load (i.e., a small
>> number of rows per second and frequent timeouts), which we eventually
>> traced to replicas shipping entire rows (which in some cases could store
>> on the order of MBs of data) to service the query.  That surprised us
>> (partly because 2.1 doesn't seem to behave this way), so we did some
>> digging, and we eventually came up with a patch that modifies
>> in the following way: if the selection in the query
>> only includes the partition key, then when building a ColumnFilter for
>> the query, use:
>>     builder = ColumnFilter.selectionBuilder();
>> instead of:
>>     builder = ColumnFilter.allColumnsBuilder();
>> to initialize the ColumnFilter.Builder in gatherQueriedColumns().  That
>> seems to repair the performance regression, and it doesn't appear to
>> break any functionality (based on the unit tests and some smoke tests we
>> ran involving insertions and deletions).
>> We'd like to contribute this patch back to the project, but we're not
>> convinced that there aren't subtle correctness issues we're missing,
>> judging both from comments in the code and the existence of
>> CASSANDRA-5912, which suggests optimizing this kind of query is
>> nontrivial.
>> So: does this change sound safe to make, or are there corner cases we
>> need to account for?  If there are corner cases, are there plausibly
>> ways of addressing them at the SelectStatement level, or will we need to
>> look deeper?
>> Thanks,
>> SK
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message