Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Thu, 10 Oct 2013 15:55:42 +0000 (UTC)
From: "Constance Eustace (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12672030.1380749488431.46019.1381420542153@arcas>
In-Reply-To: <JIRA.12672030.1380749488431@arcas>
References: <JIRA.12672030.1380749488431@arcas>
Subject: [jira] [Commented] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE
 inconsistent
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/CASSANDRA-6137?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D=
13791607#comment-13791607 ]=20

Constance Eustace commented on CASSANDRA-6137:
----------------------------------------------

It is now occurring in prod for other columns. There appears to be some has=
h key impacts here...

[10/10/13 12:13:19 AM] Aaron Gaalswyk: wasn't a regression; DB corruption o=
n a product entity
[10/10/13 12:13:20 AM] Aaron Gaalswyk: cqlsh> SELECT e_entid,e_entname,e_en=
ttype,p_prop,p_flags,p_propid,p_val,p_vallinks,p_vars FROM internal_submiss=
ion.Entity_Product WHERE e_entid =3D '0d5acd67-3131-11e3-85d7-126aad0075d4-=
PROD'  AND p_prop IN ('__CPSYS_type','__CPSYS_name','urn:bby:pcm:job:id');

(0 rows)

cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,p_val,p_v=
allinks,p_vars FROM internal_submission.Entity_Product WHERE e_entid =3D '0=
d5acd67-3131-11e3-85d7-126aad0075d4-PROD'  AND p_prop IN ('__CPSYS_type','_=
_CPSYS_name');

 e_entid                                   | e_entname                     =
                                                 | e_enttype           | p_=
prop       | p_flags | p_propid | p_val | p_vallinks | p_vars
-------------------------------------------+-------------------------------=
-------------------------------------------------+---------------------+---=
-----------+---------+----------+-------+------------+--------
 0d5acd67-3131-11e3-85d7-126aad0075d4-PROD | 1 ft Cat5e Non Booted UTP Unsh=
ielded Network Patch Cable :::: 757120254621|NEW |                null | __=
CPSYS_name |    null |     null |  null |       null |   null
 0d5acd67-3131-11e3-85d7-126aad0075d4-PROD |                               =
                                            null | urn:bby:pcm:product | __=
CPSYS_type |    null |     null |  null |       null |   null

(2 rows)
[10/10/13 12:20:12 AM] AGaal: cqlsh> SELECT e_entid,e_entname,e_enttype,p_p=
rop,p_flags,p_propid,p_val,p_vallinks,p_vars FROM internal_submission.Entit=
y_Product WHERE e_entid =3D '0d5acd67-3131-11e3-85d7-126aad0075d4-PROD'  AN=
D p_prop IN ('urn:bby:pcm:job:id');

(0 rows)
[10/10/13 12:20:12 AM] AGaal: note that in this example 'urn:bby:cpm:job:id=
' does not exist yet, so asking just for that correctly returns 0 rows:
[10/10/13 12:20:42 AM] AGaal: but if it's included in a where in() with 2 o=
ther properties that do exist, then 0 rows are also returned there too, whi=
ch is bad
[10/10/13 12:26:50 AM] AGaal: another work-around for where in() might be t=
o do a select for each desired property, so in this case there would have b=
een 3 selects; could this be faster / more efficient than selecting all?
[10/10/13 12:37:51 AM] AGaal: we might be able to get some traction here by=
 enabling some cassandra logging and playing with the query
[10/10/13 12:38:29 AM] AGaal: like if the property name is shortened to 'ur=
n:bby:pcm:', it returns the expected 2 rows
[10/10/13 12:39:08 AM] AGaal: but if it's 'urn:bby:pcm:j' or ''urn:bby:pcm:=
d ' it finds 0
[10/10/13 12:42:41 AM] AGaal: and if the last letter after urn:bby:cpm: is =
an 'a' or 'b' or 'c' it also returns 2=E2=80=A6. and it's consistent with t=
his.  So it's finding some sort of match in certain strings=E2=80=A6 like v=
ia a hash or startsWith or something

> CQL3 SELECT IN CLAUSE inconsistent
> ----------------------------------
>
>                 Key: CASSANDRA-6137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6137
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE
>            Reporter: Constance Eustace
>             Fix For: 2.0.1
>
>
> We are encountering inconsistent results from CQL3 queries with column ke=
ys using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc=
 driver.
> Rowkey is e_entid
> Column key is p_prop
> This returns roughly 21 rows for 21 column keys that match p_prop.
> cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entli=
nks,p_proplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.=
Entity_Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB';
> These three queries each return one row for the requested single column k=
ey in the IN clause:
> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_p=
roplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.Entity_=
Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_pr=
op in ('urn:bby:pcm:job:ingest:content:complete:count');
> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_p=
roplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.Entity_=
Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_pr=
op in ('urn:bby:pcm:job:ingest:content:all:count');
> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_p=
roplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.Entity_=
Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  AND p_pr=
op in ('urn:bby:pcm:job:ingest:content:fail:count');
> This query returns ONLY ONE ROW (one column key), not three as I would ex=
pect from the three-column-key IN clause:
> cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entli=
nks,p_proplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.=
Entity_Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  A=
ND p_prop in ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:=
job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> This query does return two rows however for the requested two column keys=
:
> cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entli=
nks,p_proplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.=
Entity_Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'  A=
ND p_prop in (                                                'urn:bby:pcm:=
job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count');
> cqlsh> describe table internal_submission.entity_job;
> CREATE TABLE entity_job (
>   e_entid text,
>   p_prop text,
>   describes text,
>   dndcondition text,
>   e_entlinks text,
>   e_entname text,
>   e_enttype text,
>   ingeststatus text,
>   ingeststatusdetail text,
>   p_flags text,
>   p_propid text,
>   p_proplinks text,
>   p_storage text,
>   p_subents text,
>   p_val text,
>   p_vallang text,
>   p_vallinks text,
>   p_valtype text,
>   p_valunit text,
>   p_vars text,
>   partnerid text,
>   referenceid text,
>   size int,
>   sourceip text,
>   submitdate bigint,
>   submitevent text,
>   userid text,
>   version text,
>   PRIMARY KEY (e_entid, p_prop)
> ) WITH
>   bloom_filter_fp_chance=3D0.010000 AND
>   caching=3D'KEYS_ONLY' AND
>   comment=3D'' AND
>   dclocal_read_repair_chance=3D0.000000 AND
>   gc_grace_seconds=3D864000 AND
>   index_interval=3D128 AND
>   read_repair_chance=3D0.100000 AND
>   replicate_on_write=3D'true' AND
>   populate_io_cache_on_flush=3D'false' AND
>   default_time_to_live=3D0 AND
>   speculative_retry=3D'NONE' AND
>   memtable_flush_period_in_ms=3D0 AND
>   compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND
>   compression=3D{'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describe=
s);
> CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job (dndco=
ndition);
> CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job (inges=
tstatus);
> CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job =
(ingeststatusdetail);
> CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (refere=
nceid);
> CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid);
> CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version);
> -------------------------------
> My suspicion is that the three-column-key IN Clause is translated (improp=
erly or not) to a two-column key range with the assumption that the third c=
olumn key is present in that range, but it isn't...


--
This message was sent by Atlassian JIRA
(v6.1#6144)