cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathijs Vogelzang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys
Date Thu, 04 Jun 2015 15:25:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572973#comment-14572973
] 

Mathijs Vogelzang commented on CASSANDRA-9540:
----------------------------------------------

Another observation: when testing this bug from our java code with the datastax cql driver,
we notice that while the written data is still in the memtable, everything works correctly.
Once its flushed to an SSTable on disk, the unpredictable behavior starts. Maybe this bug
is another argument for https://issues.apache.org/jira/browse/CASSANDRA-9161 ?

> Cql query doesn't return right information when using IN on columns for some keys
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>         Environment: Cassandra 2.1.5
>            Reporter: Mathijs Vogelzang
>            Assignee: Carl Yeksigian
>             Fix For: 2.1.x
>
>
> We are investigating a weird issue where one of our clients doesn't get data on his dashboard.
It seems Cassandra is not returning data for a particular key ("brokenkey" from now on).
> Some background:
> We have a row where we store a "metadata" column and data in columns "bucket/0", "bucket/1",
"bucket/2", etc. Depending on the date selection of the UI, we know that we only need to retrieve
bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve "metadata").
> A typical query may look like this (using SELECT column1 to just show what is returned,
normally we would of course do SELECT value):
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/workingkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/brokenkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> {noformat}
> These two queries work as expected, and return the information that we actually stored.
> However, when we "filter" for certain columns, the brokenkey starts behaving very weird:
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/workingkey')
and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/workingkey')
and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> ***  As expected, querying for more information doesn't really matter for the working
key ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/brokenkey')
and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
> (1 rows)
> *** Cassandra stops giving us the metadata column when asking for a few more columns!
***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/brokenkey')
and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  key | column1 | value
> -----+---------+-------
> (0 rows)
> *** Adding the bogus column name even makes it return nothing from this row anymore!
***
> {noformat}
> There are at least two rows that malfunction like this in our table (which is quite old
already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster
to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed
this column family, which hasn't helped.
> Our table structure is:
> {noformat}
> cqlsh:AppBrain> describe table "GroupedSeries";
> CREATE TABLE "AppBrain"."GroupedSeries" (
>     key blob,
>     column1 blob,
>     value blob,
>     PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
>     AND CLUSTERING ORDER BY (column1 ASC)
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
>     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 1.0
>     AND speculative_retry = 'NONE';
> {noformat}
> Let me know if I can give more information that may be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message