Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECB4F10076 for ; Wed, 11 Dec 2013 17:27:22 +0000 (UTC) Received: (qmail 1172 invoked by uid 500); 11 Dec 2013 17:27:17 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 1100 invoked by uid 500); 11 Dec 2013 17:27:17 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 98383 invoked by uid 99); 11 Dec 2013 17:27:09 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 17:27:09 +0000 Date: Wed, 11 Dec 2013 17:27:09 +0000 (UTC) From: "Alex Liu (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6151) CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845554#comment-13845554 ] Alex Liu commented on CASSANDRA-6151: ------------------------------------- [~devP] Since this is resolved as won't fix, I don't expect there is new patch for IN clause which could be done by multiple EQ queries as work around. > CqlPagingRecorderReader Used when Partition Key Is Explicitly Stated > -------------------------------------------------------------------- > > Key: CASSANDRA-6151 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6151 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Reporter: Russell Alexander Spitzer > Assignee: Alex Liu > Priority: Minor > Attachments: 6151-1.2-branch.txt, 6151-v2-1.2-branch.txt, 6151-v3-1.2-branch.txt > > > From http://stackoverflow.com/questions/19189649/composite-key-in-cassandra-with-pig/19211546#19211546 > The user was attempting to load a single partition using a where clause in a pig load statement. > CQL Table > {code} > CREATE table data ( > occurday text, > seqnumber int, > occurtimems bigint, > unique bigint, > fields map, > primary key ((occurday, seqnumber), occurtimems, unique) > ) > {code} > Pig Load statement Query > {code} > data = LOAD 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27' USING CqlStorage(); > {code} > This results in an exception when processed by the the CqlPagingRecordReader which attempts to page this query even though it contains at most one partition key. This leads to an invalid CQL statement. > CqlPagingRecordReader Query > {code} > SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND > token("occurday","seqnumber") <= ? AND occurday='A Great Day' > AND seqnumber=1 LIMIT 1000 ALLOW FILTERING > {code} > Exception > {code} > InvalidRequestException(why:occurday cannot be restricted by more than one relation if it includes an Equal) > {code} > I'm not sure it is worth the special case but, a modification to not use the paging record reader when the entire partition key is specified would solve this issue. > h3. Solution > If it have EQUAL clauses for all the partitioning keys, we use Query > {code} > SELECT * FROM "data" > WHERE occurday='A Great Day' > AND seqnumber=1 LIMIT 1000 ALLOW FILTERING > {code} > instead of > {code} > SELECT * FROM "data" > WHERE token("occurday","seqnumber") > ? > AND token("occurday","seqnumber") <= ? > AND occurday='A Great Day' > AND seqnumber=1 LIMIT 1000 ALLOW FILTERING > {code} > The base line implementation is to retrieve all data of all rows around the ring. This new feature is to retrieve all data of a wide row. It's a one level lower than the base line. It helps for the use case where user is only interested in a specific wide row, so the user doesn't spend whole job to retrieve all the rows around the ring. -- This message was sent by Atlassian JIRA (v6.1.4#6159)