cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7085) Specialized query filters for CQL3
Date Fri, 19 Jun 2015 13:48:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593427#comment-14593427
] 

Sylvain Lebresne commented on CASSANDRA-7085:
---------------------------------------------

I was unfortunately wrong, what I suggest here doesn't work.

Let me first recall the problem one more time. The semantic of CQL is that whatever the columns
selected by a query are, a row is included in the result set if it is live, even if it has
no data for the queried columns. This means that even if a query selects only a few columns,
we still might need to know if it has live data for any of the other columns in case the selected
columns don't have live data. And we cannot rely on the "row marker" since {{UPDATE}} don't
even set the "row marker" (since CASSANDRA-6782). Hence the fact that we currently query every
columns every time.

Now, my initial idea for this ticket (which is actually implemented in the current patch for
CASSANDRA-8099 but doesn't work) was to say: let's only query the columns we want, but record
the maximum timestamp for any live data that is not included in the query in the result (which
in practice means we still read all columns from disk but only send up the stack what we care
about). We can then use that max timestamp to decide if a row exists or not if we needed.
 But we don't know what would happen during reconciliation for the data we haven't queried,
so that this live timestamp idea is bogus.

So back to square one: I'm not sure we can preserve the CQL semantic without querying all
columns. And I'm not sure breaking everyone by changing the semantic now is a good idea.

The one thing we can easily do (and that wouldn't be too much work) would be to query all
columns, but only include the values for the columns the query truly cares about (we're only
interested in knowing if those columns are live or not). This would be slightly better than
what we do now, but not a whole lot.

And so I think we should seriously consider re-opening CASSANDRA-6588: it's not perfect but
it's better than not having the option imo.


> Specialized query filters for CQL3
> ----------------------------------
>
>                 Key: CASSANDRA-7085
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7085
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>              Labels: cql, perfomance
>             Fix For: 3.x
>
>
> The semantic of CQL makes it so that the current {{NamesQueryFilter}} and {{SliceQueryFilter}}
are not always as efficient as we could be. Namely, when a {{SELECT}} only selects a handful
of columns, we still have to query to query all the columns of the select rows to distinguish
between 'live row but with no data for the queried columns' and 'no row' (see CASSANDRA-6588
for more details).
> We can solve that however by adding new filters (name and slice) specialized for CQL.
The new name filter would be a list of row prefix + a list of CQL column names (instead of
one list of cell names). The slice filter would still take a ColumnSlice[] but would add the
list of column names we care about for each row.
> The new sstable readers that goes with those filter would use the list of column names
to filter out all the cells we don't care about, so we don't have to ship those back to the
coordinator to skip them there, yet would know to still return the row marker when necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message