Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9BE91107FD for ; Thu, 13 Mar 2014 13:25:14 +0000 (UTC) Received: (qmail 82697 invoked by uid 500); 13 Mar 2014 13:25:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82103 invoked by uid 500); 13 Mar 2014 13:25:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82011 invoked by uid 99); 13 Mar 2014 13:25:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 13:25:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of davemssavage@gmail.com designates 209.85.212.173 as permitted sender) Received: from [209.85.212.173] (HELO mail-wi0-f173.google.com) (209.85.212.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 13:25:03 +0000 Received: by mail-wi0-f173.google.com with SMTP id f8so3882086wiw.6 for ; Thu, 13 Mar 2014 06:24:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hVNhRBh3ZZc/EBHmgJCYApBZWPYdj9L+BTW68iw23fo=; b=FNKFA5XJVzMqx5nOkgm+SHflyEOCFfqFO4QhY0nFfv4mBxJveSLvNZriB4g8KREsRm cjqXK+QzXTFJ7AqEPqdgBoniTcO0UFmkI4vRtZjlLjpM3xKU4/GcfnY80nCab/zuL/TB teV7W6lJcAaohyzi+NL29ulKCb46DnGIzcpKZ2vrwt60zW13BUP/ZOZ+vJXV6qtEwAHZ hjMjAex8f4kXoHNIbYLQ43+XZAqgRgBIY51G6QTzIy3eb68E3Dvtuo+jhOzcqwlUUB2o u3oBPoSPd4bbJUZEGwlrTGIMHD3kHkVdkdoE536qhlHrY/2MyMfBBYqJGGZtrZf8v0Ud IGxg== MIME-Version: 1.0 X-Received: by 10.180.9.239 with SMTP id d15mr1399271wib.27.1394717081885; Thu, 13 Mar 2014 06:24:41 -0700 (PDT) Received: by 10.216.113.133 with HTTP; Thu, 13 Mar 2014 06:24:41 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Mar 2014 13:24:41 +0000 Message-ID: Subject: Re: CQL Select Map using an IN relationship From: David Savage To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a11c2aaaa3a636f04f47ce0b5 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2aaaa3a636f04f47ce0b5 Content-Type: text/plain; charset=ISO-8859-1 Hi Peter, Thanks for the help, unfortunately I'm not sure that's the problem, the id is the primary key on the documents table and the timestamp is the primary key on the eventlog table Kind regards, Dave On Thursday, 13 March 2014, Peter Lin wrote: > > it's not clear to me if your "id" column is the KEY or just a regular > column with secondary index. > > queries that have IN on non primary key columns isn't supported yet. not > sure if that answers your question. > > > On Thu, Mar 13, 2014 at 7:12 AM, David Savage wrote: > >> Hi there, >> >> I'm experimenting using cassandra and have run across an error message >> which I need a little more information on. >> >> The use case I'm experimenting with is a series of document updates >> (documents being an arbitrary map of key value pairs), I would like to find >> the latest document updates after a specified time period. I don't want to >> store many copies of the documents (one per update) as the updates are >> often only to single keys in the map so that would involve a lot of >> duplicated data. >> >> The solution I've found that seems to fit best in terms of performance is >> to have two tables. >> >> One that has an event log of timeuuid -> docid and a second that stores >> the documents themselves stored by docid -> map. I then run >> two queries, one to select ids that have changed after a certain time: >> >> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime) >> >> and then a second to select the actual documents themselves >> >> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) >> >> However this then explodes on query with the error message: >> >> "Cannot restrict PRIMARY KEY part id by IN relation as a collection is >> selected by the query" >> >> Detective work lead me to these lines in >> org.apache.cassandra.cql3.statementsSelectStatement: >> >> // We only support IN for the last name and for >> compact storage so far >> // TODO: #3885 allows us to extend to non compact as >> well, but that remains to be done >> if (i != stmt.columnRestrictions.length - 1) >> throw new >> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be >> restricted by IN relation", cname)); >> else if (stmt.selectACollection()) >> throw new >> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY part %s >> by IN relation as a collection is selected by the query", cname)); >> >> It seems like #3885 will allow support for the first IF block above, but >> I don't think it will allow the second, am I correct? >> >> Any pointers on how I can work around this would be greatly appreciated. >> >> Kind regards, >> >> Dave >> > > --001a11c2aaaa3a636f04f47ce0b5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Peter,

Thanks for the help, unfortun= ately I'm not sure that's the problem, the id is the primary key on= the documents table and the timestamp is the primar= y key on the eventlog table

Kind regards,


Dav= e


On Thursday, 13 March 2014, Peter Lin <woolfel@gmail.com> wrote:<= br>

it's not clear to= me if your "id" column is the KEY or just a regular column with = secondary index.

queries that have IN on non primary key columns isn't support= ed yet. not sure if that answers your question.


On Thu,= Mar 13, 2014 at 7:12 AM, David Savage <davemssavag= e@gmail.com> wrote:
Hi there,

I'm expe= rimenting using cassandra and have run across an error message which I need= a little more information on.

The use case I'm experimenting with is a series of = document updates (documents being an arbitrary map of key value pairs), I w= ould like to find the latest document updates after a specified time period= . I don't want to store many copies of the documents (one per update) a= s the updates are often only to single keys in the map so that would involv= e a lot of duplicated data.

The solution I've found that seems to fit best in t= erms of performance is to have two tables.

One tha= t has an event log of timeuuid -> docid and a second that stores the doc= uments themselves stored by docid -> map<string, string>. I then r= un two queries, one to select ids that have changed after a certain time:

SELECT id FROM eventlog WHERE timestamp>=3DminTimeuu= id($minimumTime)

and then a second to select the a= ctual documents themselves

SELECT id, data FROM do= cuments WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)

However this then explodes on query with the erro= r message:

"Cannot restrict PRIMARY KEY part = id by IN relation as a collection is selected by the query"

Detective work lead me to these lines in org.apache.cassandr= a.cql3.statementsSelectStatement:

  &nbs= p;                 // We only suppo= rt IN for the last name and for compact storage so far
                    = // TODO: #3885 allows us to extend to non compact as well, but that remains= to be done
              &nbs= p;     if (i !=3D stmt.columnRestrictions.length - 1)
&= nbsp;                    =   throw new InvalidRequestException(String.format("PRIMARY KEY p= art %s cannot be restricted by IN relation", cname));
                    = else if (stmt.selectACollection())
        &n= bsp;               throw new InvalidRequ= estException(String.format("Cannot restrict PRIMARY KEY part %s by IN = relation as a collection is selected by the query", cname));

It seems like #3885 will allow support for the fi= rst IF block above, but I don't think it will allow the second, am I co= rrect? 

Any pointers on how I can work around= this would be greatly appreciated.

Kind regards,

Dave

--001a11c2aaaa3a636f04f47ce0b5--