Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8123110BCF for ; Thu, 13 Mar 2014 14:40:34 +0000 (UTC) Received: (qmail 24824 invoked by uid 500); 13 Mar 2014 14:40:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24002 invoked by uid 500); 13 Mar 2014 14:40:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23986 invoked by uid 99); 13 Mar 2014 14:40:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 14:40:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davemssavage@gmail.com designates 209.85.212.179 as permitted sender) Received: from [209.85.212.179] (HELO mail-wi0-f179.google.com) (209.85.212.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 14:40:22 +0000 Received: by mail-wi0-f179.google.com with SMTP id f8so1228774wiw.12 for ; Thu, 13 Mar 2014 07:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vXpBiy8P536eUIdNqdxo5GsrMqrOG/lClQ24uzO+4/8=; b=RGbt4Mmskxl+pmvOImF1EGlvHMpcfCmbxJQ/Jq1cxID4rp6epo6hNaKqCZiUcYY8Uy Ehw2a9S4kHYEjDmPPj7x6u8KqhfmelvYRC2PczrtrJmuQo57ioWa8ZGVppvZ6vtIOmSN HXVlyMMQ0KQz189JoFYwlbqsprOt/6593sokuzddgh4D0NHA5EE19cTdeL4GRC1w2sQo CFZ4jt7M56Frbb5kweuZeJVqhpZvhhYRnHypxMVWlPbkoi4Ul8bAKp3n0LwJb5RfYdN2 euWYy9z8MmAm1g5B07xbLJebrU5VNvRoOwrZ7A/wO4Zf8bJBlhTJ4CIpiUFYgpm7medW FhYg== MIME-Version: 1.0 X-Received: by 10.180.19.138 with SMTP id f10mr1931714wie.11.1394721601845; Thu, 13 Mar 2014 07:40:01 -0700 (PDT) Received: by 10.216.113.133 with HTTP; Thu, 13 Mar 2014 07:40:01 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Mar 2014 14:40:01 +0000 Message-ID: Subject: Re: CQL Select Map using an IN relationship From: David Savage To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=bcaec53d550da3812c04f47ded5f X-Virus-Checked: Checked by ClamAV on apache.org --bcaec53d550da3812c04f47ded5f Content-Type: text/plain; charset=ISO-8859-1 Nope, upgraded to 2.0.5 and still get the same problem, I actually simplified the problem a little in my first post, there's a composite primary key involved as I need to partition ids into groups So the full CQL statements are: CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; CREATE TABLE test.documents (group text,id bigint,data map,PRIMARY KEY (group, id)); INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'}); INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'}); INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'}); SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2); Thanks for your help. Kind regards, /Dave On Thu, Mar 13, 2014 at 2:00 PM, David Savage wrote: > Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got > dragged in by the cassandra unit library I'm using for testing [1] I will > try to fix my build dependencies and retry, thx. > > /Dave > > [1] https://github.com/jsevellec/cassandra-unit > > > On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael > wrote: > >> I have no problem doing this w 2.0.5 - what version of C* are you using? >> Or maybe I don't understand your data model... attach 'creates' if you >> don't mind. >> >> ml >> >> >> On Thu, Mar 13, 2014 at 9:24 AM, David Savage wrote: >> >>> Hi Peter, >>> >>> Thanks for the help, unfortunately I'm not sure that's the problem, the >>> id is the primary key on the documents table and the timestamp is the >>> primary key on the eventlog table >>> >>> Kind regards, >>> >>> >>> Dave >>> >>> On Thursday, 13 March 2014, Peter Lin wrote: >>> >>>> >>>> it's not clear to me if your "id" column is the KEY or just a regular >>>> column with secondary index. >>>> >>>> queries that have IN on non primary key columns isn't supported yet. >>>> not sure if that answers your question. >>>> >>>> >>>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage wrote: >>>> >>>>> Hi there, >>>>> >>>>> I'm experimenting using cassandra and have run across an error message >>>>> which I need a little more information on. >>>>> >>>>> The use case I'm experimenting with is a series of document updates >>>>> (documents being an arbitrary map of key value pairs), I would like to find >>>>> the latest document updates after a specified time period. I don't want to >>>>> store many copies of the documents (one per update) as the updates are >>>>> often only to single keys in the map so that would involve a lot of >>>>> duplicated data. >>>>> >>>>> The solution I've found that seems to fit best in terms of performance >>>>> is to have two tables. >>>>> >>>>> One that has an event log of timeuuid -> docid and a second that >>>>> stores the documents themselves stored by docid -> map. I >>>>> then run two queries, one to select ids that have changed after a certain >>>>> time: >>>>> >>>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime) >>>>> >>>>> and then a second to select the actual documents themselves >>>>> >>>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...) >>>>> >>>>> However this then explodes on query with the error message: >>>>> >>>>> "Cannot restrict PRIMARY KEY part id by IN relation as a collection is >>>>> selected by the query" >>>>> >>>>> Detective work lead me to these lines in >>>>> org.apache.cassandra.cql3.statementsSelectStatement: >>>>> >>>>> // We only support IN for the last name and for >>>>> compact storage so far >>>>> // TODO: #3885 allows us to extend to non compact >>>>> as well, but that remains to be done >>>>> if (i != stmt.columnRestrictions.length - 1) >>>>> throw new >>>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be >>>>> restricted by IN relation", cname)); >>>>> else if (stmt.selectACollection()) >>>>> throw new >>>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY part %s >>>>> by IN relation as a collection is selected by the query", cname)); >>>>> >>>>> It seems like #3885 will allow support for the first IF block above, >>>>> but I don't think it will allow the second, am I correct? >>>>> >>>>> Any pointers on how I can work around this would be greatly >>>>> appreciated. >>>>> >>>>> Kind regards, >>>>> >>>>> Dave >>>>> >>>> >>>> >> > --bcaec53d550da3812c04f47ded5f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Nope, upgraded to 2.0.5 and still get the same problem, I = actually simplified the problem a little in my first post, there's a co= mposite primary key involved as I need to partition ids into groups
So the full CQL statements are:

CREATE KEYSPACE test = WITH replication =3D {'class':'SimpleStrategy', 'replic= ation_factor':3};


CREATE TABLE t= est.documents (group text,id bigint,data map<text,text>,PRIMARY KEY (= group, id));


INSERT INTO te= st.documents(id,group,data) VALUES (0,'test',{'count':'= 0'});

INSERT INTO te= st.documents(id,group,data) VALUES (1,'test',{'count':'= 1'});

INSERT INTO te= st.documents(id,group,data) VALUES (2,'test',{'count':'= 2'});


SELECT id,data= FROM test.documents WHERE group=3D'test' AND id IN (0,1,2);


Thanks for your help.


Kind regards,


/Dave<= /p>



On Thu, Mar 13, 2014 at 2:00 PM, David Savage <davemssavage@gmail.c= om> wrote:
Hmmm that maybe the pr= oblem, I'm currently testing with 2.0.2 which got dragged in by the cas= sandra unit library I'm using for testing [1] I will try to fix my buil= d dependencies and retry, thx.

/Dave


On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael <michael.laing@nyt= imes.com> wrote:
I have no problem doing this w 2.0.5 - what version of C* = are you using? Or maybe I don't understand your data model... attach &#= 39;creates' if you don't mind.

ml


On Thu, Mar 13, 2014 at 9:24 AM, David S= avage <davemssavage@gmail.com> wrote:
Hi Peter,

Thanks for the help, unfortun= ately I'm not sure that's the problem, the id is the primary key on= the documents table and the timestamp is the primar= y key on the eventlog table

Kind regards,


Dav= e


On Thursday, 13 March 2014, Peter Lin <woolfel@gmail.com&g= t; wrote:

it's not clear to= me if your "id" column is the KEY or just a regular column with = secondary index.

queries that have IN on non primary key columns isn't support= ed yet. not sure if that answers your question.


On Thu,= Mar 13, 2014 at 7:12 AM, David Savage <davemssavag= e@gmail.com> wrote:
Hi there,

I'm expe= rimenting using cassandra and have run across an error message which I need= a little more information on.

The use case I'm experimenting with is a series of = document updates (documents being an arbitrary map of key value pairs), I w= ould like to find the latest document updates after a specified time period= . I don't want to store many copies of the documents (one per update) a= s the updates are often only to single keys in the map so that would involv= e a lot of duplicated data.

The solution I've found that seems to fit best in t= erms of performance is to have two tables.

One tha= t has an event log of timeuuid -> docid and a second that stores the doc= uments themselves stored by docid -> map<string, string>. I then r= un two queries, one to select ids that have changed after a certain time:

SELECT id FROM eventlog WHERE timestamp>=3DminTimeuu= id($minimumTime)

and then a second to select the a= ctual documents themselves

SELECT id, data FROM do= cuments WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)

However this then explodes on query with the erro= r message:

"Cannot restrict PRIMARY KEY part = id by IN relation as a collection is selected by the query"

Detective work lead me to these lines in org.apache.cassandr= a.cql3.statementsSelectStatement:

  &nbs= p;                 // We only suppo= rt IN for the last name and for compact storage so far
                    = // TODO: #3885 allows us to extend to non compact as well, but that remains= to be done
              &nbs= p;     if (i !=3D stmt.columnRestrictions.length - 1)
&= nbsp;                    =   throw new InvalidRequestException(String.format("PRIMARY KEY p= art %s cannot be restricted by IN relation", cname));
                    = else if (stmt.selectACollection())
        &n= bsp;               throw new InvalidRequ= estException(String.format("Cannot restrict PRIMARY KEY part %s by IN = relation as a collection is selected by the query", cname));

It seems like #3885 will allow support for the fi= rst IF block above, but I don't think it will allow the second, am I co= rrect? 

Any pointers on how I can work around= this would be greatly appreciated.

Kind regards,

Dave




--bcaec53d550da3812c04f47ded5f--