cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Novgorodov (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-13379) SASI index returns duplicate rows
Date Sun, 26 Mar 2017 21:15:41 GMT
Igor Novgorodov created CASSANDRA-13379:

             Summary: SASI index returns duplicate rows
                 Key: CASSANDRA-13379
             Project: Cassandra
          Issue Type: Bug
          Components: sasi
            Reporter: Igor Novgorodov

CREATE TABLE bulks_recipients (
    bulk_id uuid,
    recipient text,
    bulk_id_idx uuid,
    status int,
    ts timestamp,
    PRIMARY KEY ((bulk_id, recipient))

*bulk_id_idx* is just a copy of *bulk_id* because SASI does not work on partition key component
at all for some reason.

CREATE CUSTOM INDEX bulks_recipients_bulk_id ON bulks_recipients (bulk_id_idx) USING 'org.apache.cassandra.index.sasi.SASIIndex';

Then i insert 1 million rows with the same *bulk_id* and different *recipient*. Then 

> select count(*) from bulks_recipients ;


(1 rows)

Ok, it's fine here. Now let's query by SASI:
> select count(*) from bulks_recipients where bulk_id_idx = fedd95ec-2cc8-4040-8619-baf69647700b;


(1 rows)
Hmm, very strange count - 10101 extra rows.
Ok, i've dumped the query result into a text file:
# cat sasi.txt | wc -l
Here we have 200 extra rows for some reason.

Let's check if these are duplicates:
# cat sasi.txt | sort | uniq | wc -l
Yep, looks like.

Recreating index does not help.

This message was sent by Atlassian JIRA

View raw message