cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11223) Queries with LIMIT filtering on clustering columns can return less rows than expected
Date Thu, 13 Jul 2017 13:20:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085685#comment-16085685
] 

Benjamin Lerer commented on CASSANDRA-11223:
--------------------------------------------

{quote}I missed it last time sorry, but we should not count the static row in {{GroupByPrefixReversed.count()}}
when {{countPartitionsWithOnlyStaticData}} is false.{quote}

I did not change it on purpose.
The problem only affect range queries and multi-partition queries. Range queries do not accept
an {{ORDER BY}} clause. Multi-partition queries only accept an {{ORDER BY}} clause when paging
is off. The limit in this case is used only when the rows with only static data have already
been discarded. So, in practice changing {{GroupByPrefixReversed.count()}}  has no effect.

I will add a test to prove that the behavior is correct.

{quote}Are we sure it's fine to always count the static row in {{ColumnFamily.liveCQL3RowCount()}}?{quote}

{{ColumnFamily.liveCQL3RowCount()}} is only used by {{ColumnFamilyStore::isFilterFullyCoveredBy}}
to check if the whole partition is cached. That we count the static row or not, the answer
will be correct.
We could argue about {{ColumnFamily.liveCQL3RowCount()}} independently of its current use,
but I am in favor of minimizing the changes on {{2.2}} (taken into account that everything
is different in {{3.0}}). What do you think?

{quote}Should we count static rows for the limits used by {{RowCacheSerializer.deserialize()}}?{quote}

I think it is fine. The limit is only used for limiting the number of rows stored in the cache
for each partition and we will only count the static row if the partition does not have any
row.  
 
{quote}I haven't checked the 3.11 and trunk patches yet, did they apply or do they need a
full review?{quote}
{{3.11}} is different due to the {{GROUP BY}} functionality.
 
I pushed the updated branches.

> Queries with LIMIT filtering on clustering columns can return less rows than expected
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11223
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11223
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> A query like {{SELECT * FROM %s WHERE b = 1 LIMIT 2 ALLOW FILTERING}} can return less
row than expected if the table has some static columns and some of the partition have no rows
matching b = 1.
> The problem can be reproduced with the following unit test:
> {code}
>     public void testFilteringOnClusteringColumnsWithLimitAndStaticColumns() throws
Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, s int static, c int,
primary key (a, b))");
>         for (int i = 0; i < 3; i++)
>         {
>             execute("INSERT INTO %s (a, s) VALUES (?, ?)", i,
i);
>                 for (int j = 0; j < 3; j++)
>                     if (!(i == 0 && j
== 1))
>                         execute("INSERT
INTO %s (a, b, c) VALUES (?, ?, ?)", i, j, i + j);
>         }
>         assertRows(execute("SELECT * FROM %s"),
>                    row(1, 0, 1, 1),
>                    row(1, 1, 1, 2),
>                    row(1, 2, 1, 3),
>                    row(0, 0, 0, 0),
>                    row(0, 2, 0, 2),
>                    row(2, 0, 2, 2),
>                    row(2, 1, 2, 3),
>                    row(2, 2, 2, 4));
>         assertRows(execute("SELECT * FROM %s WHERE b = 1 ALLOW FILTERING"),
>                    row(1, 1, 1, 2),
>                    row(2, 1, 2, 3));
>         assertRows(execute("SELECT * FROM %s WHERE b = 1 LIMIT 2 ALLOW
FILTERING"),
>                    row(1, 1, 1, 2),
>                    row(2, 1, 2, 3)); // <--------
FAIL It returns only one row because the static row of partition 0 is counted and filtered
out in SELECT statement
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message