cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Constance Eustace (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent
Date Thu, 17 Oct 2013 15:35:47 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798011#comment-13798011
] 

Constance Eustace edited comment on CASSANDRA-6137 at 10/17/13 3:34 PM:
------------------------------------------------------------------------

Here is the code I'm using to reproduce, it's a SpringSource Tool Suite groovy project...


https://github.com/cowarlydragon/CASS-6137

But we just got a node restart that has compaction errors due to counter column use. This
example does not have counter columns in the schema, but there were existing keyspaces with
counters in my run, so I'm going to clean my entire local schema and re-run.

The error seems to take a bit of time to appear. I ran yesterday on and off while writing
the reproduction code and it appeared only after a node crash, but I ran it last night overnight
and it was appearing without crash in the morning after about 6 hours of running. 

I've dropped all my user keyspaces and I'll try to reproduce that way and see how long it
takes to appear.


was (Author: cowardlydragon):
Here is the code I'm using to reproduce, it's a SpringSource Tool Suite groovy project...


https://github.com/cowarlydragon/CASS-6137

But we just got a node restart that has compaction errors due to counter column use. This
example does not have counter columns in the schema, but there were existing keyspaces with
counters, so I'm going to clean my entire local schema and re-run.

The error seems to take a bit of time to appear. I ran yesterday on and off while writing
the reproduction code and it appeared only after a node crash, but I ran it last night overnight
and it was appearing without crash in the morning after about 6 hours of running. 

I've dropped all my user keyspaces and I'll try to reproduce that way and see how long it
takes to appear.

> CQL3 SELECT IN CLAUSE inconsistent
> ----------------------------------
>
>                 Key: CASSANDRA-6137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE on EBS RAID storage
> OSX Cassandra 1.2.8 on SSD storage
>            Reporter: Constance Eustace
>            Priority: Minor
>
> I am elevating this to Critical after doing some trace and reproducing in several environments.
No one has commented on this bug from the cassandra team, and I view unreliable/corrupted
data a pretty big deal. We are considering pulling cassandra and using something else.
> We have the data state reproduced locally in an environment that we can set TRACE logging,
attach a debugger, etc. Some guidance as to where to look would be greatly appreciated.
> ------------------------------------------------------------------
> We are encountering inconsistent results from CQL3 queries with column keys using IN
clause in WHERE. This has been reproduced in cqlsh and the jdbc driver.
> Rowkey is e_entid
> Column key is p_prop
> This returns roughly 21 rows for 21 column keys that match p_prop.
> cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Job WHERE e_entid = '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
> These three queries each return one row for the requested single column key in the IN
clause:
> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Job WHERE e_entid = '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'
 AND p_prop in ('urn:bby:pcm:job:ingest:content:complete:count');
> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Job WHERE e_entid = '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'
 AND p_prop in ('urn:bby:pcm:job:ingest:content:all:count');
> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Job WHERE e_entid = '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'
 AND p_prop in ('urn:bby:pcm:job:ingest:content:fail:count');
> This query returns ONLY ONE ROW (one column key), not three as I would expect from the
three-column-key IN clause:
> cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Job WHERE e_entid = '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'
 AND p_prop in ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> This query does return two rows however for the requested two column keys:
> cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_proplinks,p_subents,p_val,p_vallinks,p_vars
FROM internal_submission.Entity_Job WHERE e_entid = '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'
 AND p_prop in (                                                'urn:bby:pcm:job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> cqlsh> describe table internal_submission.entity_job;
> CREATE TABLE entity_job (
>   e_entid text,
>   p_prop text,
>   describes text,
>   dndcondition text,
>   e_entlinks text,
>   e_entname text,
>   e_enttype text,
>   ingeststatus text,
>   ingeststatusdetail text,
>   p_flags text,
>   p_propid text,
>   p_proplinks text,
>   p_storage text,
>   p_subents text,
>   p_val text,
>   p_vallang text,
>   p_vallinks text,
>   p_valtype text,
>   p_valunit text,
>   p_vars text,
>   partnerid text,
>   referenceid text,
>   size int,
>   sourceip text,
>   submitdate bigint,
>   submitevent text,
>   userid text,
>   version text,
>   PRIMARY KEY (e_entid, p_prop)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='NONE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describes);
> CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job (dndcondition);
> CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job (ingeststatus);
> CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job (ingeststatusdetail);
> CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (referenceid);
> CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
> CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
> -------------------------------
> My suspicion is that the three-column-key IN Clause is translated (improperly or not)
to a two-column key range with the assumption that the third column key is present in that
range, but it isn't...
> -----------------------------------
> We have tried: nodetool cache invalidations, start/stop of cassandra. Those did NOT fix
the problem. A table dump (COPY TO) and then reload (COPY FROM) does fix the rows, but then
more corruption creeps in.
> We are using the cassandra-jdbc driver, but I don't see anything wrong with the issued
statements inside the cassandra source code when I step through the code. 
> With additional writes, it may be possible that some rows get fixed. Compaction or other
jobs may repair this, but on the timescale of hours done debugging, the failures are consistent.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message