cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Knighton <joel.knigh...@datastax.com>
Subject Re: select query on entire primary key returning more than one row in result
Date Wed, 15 Jun 2016 00:31:06 GMT
Great work, Bhuvan - I sat down after work to look at this more carefully.

For a short summary, you are correct.

For a longer summary, I initially thought the reproduction you provided
would not run into the issue from 3.4/3.5 because it didn't select any
static columns, which meant that it wouldn't have statics in its
ColumnFilter (basically, the filter we apply when deciding if we need to
look for the requested data in more SSTables). This was an incorrect
understanding - in order to preserve the CQL semantic (see CASSANDRA-6588
for details), we are including all columns, including the static columns,
in the fetched columns, which means they are part of the ColumnFilter. I
believe there may be an opportunity for an optimization here, but that's a
whole different discussion. I now agree that these are the same issue.

You are correct in your analysis that 3.4/3.5 are the only affected
versions. It has been patched in release 3.6 forward and was not introduced
until 3.4

Thanks for sticking with me on this - I'm going to resolve CASSANDRA-12003
as a duplicate of CASSANDRA-11513.

On Tue, Jun 14, 2016 at 4:21 PM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:

> Joel,
>
> Thanks for your reply, I have checked and found that the behavior is same
> in case of CASSANDRA-11513
> <https://issues.apache.org/jira/browse/CASSANDRA-11513>. I have verified
> this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They
> both don't occur in 3.0.4, 3.6 & 3.7.
>
> Please find below the results of selecting only pk and clustering key from 11513.
> It has also been verified that both issues occur while selecting all /
> filtered rows therefore selection criteria is not an issue filtering by
> WHERE is:
>
> cqlsh:ks> select pk,a from test0 where pk=0 and a=2;
>
>  pk | a
> ----+---
>   0 | 1
>   0 | 2
>   0 | 3
>
> We can verify this claim by applying 11513 Patch to 3.5 Tag and build &
> test for 12003. If it is fixed then we can guarantee the claim. Let me
> know if any further input may possibly be required here.
>
> On Wed, Jun 15, 2016 at 2:23 AM, Joel Knighton <joel.knighton@datastax.com
> > wrote:
>
>> The important part of that query is that it's selecting a static column
>> (with select *), not whether it is filtering on one. In CASSANDRA-12003 and
>> this thread, it looks like you're only selecting the primary and clustering
>> columns. I'd be cautious about concluding that CASSANDRA-12003 and
>> CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.
>>
>> If you have a reproduction path for CASSANDRA-12003, I'd recommend
>> attaching it to a ticket, and someone can investigate internals to see if
>> CASSANDRA-11513 (or something else entirely) fixed the issue.
>>
>> On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bhu1rawal@gmail.com>
>> wrote:
>>
>>> Joel,
>>>
>>> If we look at the schema carefully:
>>>
>>> CREATE TABLE test0 (
>>>     pk int,
>>>     a int,
>>>     b text,
>>>     s text static,
>>>     PRIMARY KEY (*pk, a)*
>>> );
>>>
>>> and filtering is performed on clustering column a and its not a static
>>> column:
>>>
>>> select * from test0 where pk=0 and a=2;
>>>
>>>
>>>
>>> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
>>> joel.knighton@datastax.com> wrote:
>>>
>>>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on
>>>> you selecting a static column, which you weren't doing in the reported
>>>> issue. That said, I haven't looked too closely.
>>>>
>>>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bhu1rawal@gmail.com>
>>>> wrote:
>>>>
>>>>> I can reproduce CASSANDRA-11513
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally
on
>>>>> 3.5, possible duplicate.
>>>>>
>>>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>>>> joel.knighton@datastax.com> wrote:
>>>>>
>>>>>> There's some precedent for similar issues with static columns in
3.5
>>>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>>>> deterministic (or somewhat deterministic) path for reproduction would
help
>>>>>> narrow the issue down farther. I've played around locally with similar
>>>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bhu1rawal@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jira CASSANDRA-12003
>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003>
Has been
>>>>>>> created for the same.
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>
>>>>>>>> Hi Tyler,
>>>>>>>>
>>>>>>>> This issue is mainly visible for tables having static columns,
>>>>>>>> still investigating.
>>>>>>>> We will try to test after removing lucene index but I don’t
think
>>>>>>>> this plug-in could led to change in behaviour of cassandra
write to table's
>>>>>>>> memtable.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>> Atul Saroha
>>>>>>>> *Lead Software Engineer*
>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <tyler@datastax.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is 'id' your partition key? I'm not familiar with the
stratio
>>>>>>>>> indexes, but it looks like the primary key columns are
both indexed.
>>>>>>>>> Perhaps this is related?
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>>>
>>>>>>>>>> After further debug, this issue is found in in-memory
memtable as
>>>>>>>>>> doing nodetool flush + compact resolve the issue.
And there is no batch
>>>>>>>>>> write used for this table which is showing issue.
>>>>>>>>>> Table properties:
>>>>>>>>>>
>>>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition':
'NONE'}
>>>>>>>>>>>     AND comment = ''
>>>>>>>>>>>     AND compaction = {'class':
>>>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>>>>     AND compression = {'chunk_length_in_kb':
'64', 'class':
>>>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>>>>     AND min_index_interval = 128
>>>>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS
= {'refresh_seconds':
>>>>>>>>>>> '1', 'schema': '{
>>>>>>>>>>>         fields : {
>>>>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>>>>             f_d_name : {
>>>>>>>>>>>                 type           : "string",
>>>>>>>>>>>                 indexed        : true,
>>>>>>>>>>>                 sorted         : false,
>>>>>>>>>>>                 validated      : true,
>>>>>>>>>>>                 case_sensitive : false
>>>>>>>>>>>             }
>>>>>>>>>>>         }
>>>>>>>>>>>     }'};
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>>>> Atul Saroha
>>>>>>>>>> *Lead Software Engineer*
>>>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*:
12369
>>>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma
<
>>>>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> No, all rows were not the same.
>>>>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>>>>> In the erroneous result, while querying on partition
key and
>>>>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>>>>
>>>>>>>>>>> And for "*tombstone_threshold"* there isn't any
entry at column
>>>>>>>>>>> family level.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Siddharth Verma
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Tyler Hobbs
>>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> <http://www.datastax.com/>
>>>>>>
>>>>>> Joel Knighton
>>>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>>>
>>>>>> <https://www.linkedin.com/company/datastax>
>>>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>>
>>>>>> <http://cassandrasummit.org/Email_Signature>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> <http://www.datastax.com/>
>>>>
>>>> Joel Knighton
>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>
>>>> <https://www.linkedin.com/company/datastax>
>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>
>>>> <http://cassandrasummit.org/Email_Signature>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> <http://www.datastax.com/>
>>
>> Joel Knighton
>> Cassandra Developer | joel.knighton@datastax.com
>>
>> <https://www.linkedin.com/company/datastax>
>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> <http://cassandrasummit.org/Email_Signature>
>>
>
>


-- 

<http://www.datastax.com/>

Joel Knighton
Cassandra Developer | joel.knighton@datastax.com

<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<http://cassandrasummit.org/Email_Signature>

Mime
View raw message