Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69A2A10A3E for ; Thu, 10 Oct 2013 15:55:49 +0000 (UTC) Received: (qmail 29711 invoked by uid 500); 10 Oct 2013 15:55:43 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 29660 invoked by uid 500); 10 Oct 2013 15:55:42 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 29632 invoked by uid 99); 10 Oct 2013 15:55:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 15:55:42 +0000 Date: Thu, 10 Oct 2013 15:55:42 +0000 (UTC) From: "Constance Eustace (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6137) CQL3 SELECT IN CLAUSE inconsistent MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6137?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 13791607#comment-13791607 ]=20 Constance Eustace commented on CASSANDRA-6137: ---------------------------------------------- It is now occurring in prod for other columns. There appears to be some has= h key impacts here... [10/10/13 12:13:19 AM] Aaron Gaalswyk: wasn't a regression; DB corruption o= n a product entity [10/10/13 12:13:20 AM] Aaron Gaalswyk: cqlsh> SELECT e_entid,e_entname,e_en= ttype,p_prop,p_flags,p_propid,p_val,p_vallinks,p_vars FROM internal_submiss= ion.Entity_Product WHERE e_entid =3D '0d5acd67-3131-11e3-85d7-126aad0075d4-= PROD' AND p_prop IN ('__CPSYS_type','__CPSYS_name','urn:bby:pcm:job:id'); (0 rows) cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,p_val,p_v= allinks,p_vars FROM internal_submission.Entity_Product WHERE e_entid =3D '0= d5acd67-3131-11e3-85d7-126aad0075d4-PROD' AND p_prop IN ('__CPSYS_type','_= _CPSYS_name'); e_entid | e_entname = | e_enttype | p_= prop | p_flags | p_propid | p_val | p_vallinks | p_vars -------------------------------------------+-------------------------------= -------------------------------------------------+---------------------+---= -----------+---------+----------+-------+------------+-------- 0d5acd67-3131-11e3-85d7-126aad0075d4-PROD | 1 ft Cat5e Non Booted UTP Unsh= ielded Network Patch Cable :::: 757120254621|NEW | null | __= CPSYS_name | null | null | null | null | null 0d5acd67-3131-11e3-85d7-126aad0075d4-PROD | = null | urn:bby:pcm:product | __= CPSYS_type | null | null | null | null | null (2 rows) [10/10/13 12:20:12 AM] AGaal: cqlsh> SELECT e_entid,e_entname,e_enttype,p_p= rop,p_flags,p_propid,p_val,p_vallinks,p_vars FROM internal_submission.Entit= y_Product WHERE e_entid =3D '0d5acd67-3131-11e3-85d7-126aad0075d4-PROD' AN= D p_prop IN ('urn:bby:pcm:job:id'); (0 rows) [10/10/13 12:20:12 AM] AGaal: note that in this example 'urn:bby:cpm:job:id= ' does not exist yet, so asking just for that correctly returns 0 rows: [10/10/13 12:20:42 AM] AGaal: but if it's included in a where in() with 2 o= ther properties that do exist, then 0 rows are also returned there too, whi= ch is bad [10/10/13 12:26:50 AM] AGaal: another work-around for where in() might be t= o do a select for each desired property, so in this case there would have b= een 3 selects; could this be faster / more efficient than selecting all? [10/10/13 12:37:51 AM] AGaal: we might be able to get some traction here by= enabling some cassandra logging and playing with the query [10/10/13 12:38:29 AM] AGaal: like if the property name is shortened to 'ur= n:bby:pcm:', it returns the expected 2 rows [10/10/13 12:39:08 AM] AGaal: but if it's 'urn:bby:pcm:j' or ''urn:bby:pcm:= d ' it finds 0 [10/10/13 12:42:41 AM] AGaal: and if the last letter after urn:bby:cpm: is = an 'a' or 'b' or 'c' it also returns 2=E2=80=A6. and it's consistent with t= his. So it's finding some sort of match in certain strings=E2=80=A6 like v= ia a hash or startsWith or something > CQL3 SELECT IN CLAUSE inconsistent > ---------------------------------- > > Key: CASSANDRA-6137 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6137 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu AWS Cassandra 2.0.1 SINGLE NODE > Reporter: Constance Eustace > Fix For: 2.0.1 > > > We are encountering inconsistent results from CQL3 queries with column ke= ys using IN clause in WHERE. This has been reproduced in cqlsh and the jdbc= driver. > Rowkey is e_entid > Column key is p_prop > This returns roughly 21 rows for 21 column keys that match p_prop. > cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entli= nks,p_proplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.= Entity_Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB'; > These three queries each return one row for the requested single column k= ey in the IN clause: > SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_p= roplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.Entity_= Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_pr= op in ('urn:bby:pcm:job:ingest:content:complete:count'); > SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_p= roplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.Entity_= Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_pr= op in ('urn:bby:pcm:job:ingest:content:all:count'); > SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entlinks,p_p= roplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.Entity_= Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' AND p_pr= op in ('urn:bby:pcm:job:ingest:content:fail:count'); > This query returns ONLY ONE ROW (one column key), not three as I would ex= pect from the three-column-key IN clause: > cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entli= nks,p_proplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.= Entity_Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' A= ND p_prop in ('urn:bby:pcm:job:ingest:content:complete:count','urn:bby:pcm:= job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count'); > This query does return two rows however for the requested two column keys= : > cqlsh> SELECT e_entid,e_entname,e_enttype,p_prop,p_flags,p_propid,e_entli= nks,p_proplinks,p_subents,p_val,p_vallinks,p_vars FROM internal_submission.= Entity_Job WHERE e_entid =3D '845b38f1-2b91-11e3-854d-126aad0075d4-CJOB' A= ND p_prop in ( 'urn:bby:pcm:= job:ingest:content:all:count','urn:bby:pcm:job:ingest:content:fail:count'); > cqlsh> describe table internal_submission.entity_job; > CREATE TABLE entity_job ( > e_entid text, > p_prop text, > describes text, > dndcondition text, > e_entlinks text, > e_entname text, > e_enttype text, > ingeststatus text, > ingeststatusdetail text, > p_flags text, > p_propid text, > p_proplinks text, > p_storage text, > p_subents text, > p_val text, > p_vallang text, > p_vallinks text, > p_valtype text, > p_valunit text, > p_vars text, > partnerid text, > referenceid text, > size int, > sourceip text, > submitdate bigint, > submitevent text, > userid text, > version text, > PRIMARY KEY (e_entid, p_prop) > ) WITH > bloom_filter_fp_chance=3D0.010000 AND > caching=3D'KEYS_ONLY' AND > comment=3D'' AND > dclocal_read_repair_chance=3D0.000000 AND > gc_grace_seconds=3D864000 AND > index_interval=3D128 AND > read_repair_chance=3D0.100000 AND > replicate_on_write=3D'true' AND > populate_io_cache_on_flush=3D'false' AND > default_time_to_live=3D0 AND > speculative_retry=3D'NONE' AND > memtable_flush_period_in_ms=3D0 AND > compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND > compression=3D{'sstable_compression': 'LZ4Compressor'}; > CREATE INDEX internal_submission__JobDescribesIDX ON entity_job (describe= s); > CREATE INDEX internal_submission__JobDNDConditionIDX ON entity_job (dndco= ndition); > CREATE INDEX internal_submission__JobIngestStatusIDX ON entity_job (inges= tstatus); > CREATE INDEX internal_submission__JobIngestStatusDetailIDX ON entity_job = (ingeststatusdetail); > CREATE INDEX internal_submission__JobReferenceIDIDX ON entity_job (refere= nceid); > CREATE INDEX internal_submission__JobUserIDX ON entity_job (userid); > CREATE INDEX internal_submission__JobVersionIDX ON entity_job (version); > ------------------------------- > My suspicion is that the three-column-key IN Clause is translated (improp= erly or not) to a two-column key range with the assumption that the third c= olumn key is present in that range, but it isn't... -- This message was sent by Atlassian JIRA (v6.1#6144)