incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Savage <davemssav...@gmail.com>
Subject Re: CQL Select Map using an IN relationship
Date Thu, 13 Mar 2014 14:00:32 GMT
Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
dragged in by the cassandra unit library I'm using for testing [1] I will
try to fix my build dependencies and retry, thx.

/Dave

[1] https://github.com/jsevellec/cassandra-unit


On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael
<michael.laing@nytimes.com>wrote:

> I have no problem doing this w 2.0.5 - what version of C* are you using?
> Or maybe I don't understand your data model... attach 'creates' if you
> don't mind.
>
> ml
>
>
> On Thu, Mar 13, 2014 at 9:24 AM, David Savage <davemssavage@gmail.com>wrote:
>
>> Hi Peter,
>>
>> Thanks for the help, unfortunately I'm not sure that's the problem, the
>> id is the primary key on the documents table and the timestamp is the
>> primary key on the eventlog table
>>
>> Kind regards,
>>
>>
>> Dave
>>
>> On Thursday, 13 March 2014, Peter Lin <woolfel@gmail.com> wrote:
>>
>>>
>>> it's not clear to me if your "id" column is the KEY or just a regular
>>> column with secondary index.
>>>
>>> queries that have IN on non primary key columns isn't supported yet. not
>>> sure if that answers your question.
>>>
>>>
>>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage <davemssavage@gmail.com>wrote:
>>>
>>>> Hi there,
>>>>
>>>> I'm experimenting using cassandra and have run across an error message
>>>> which I need a little more information on.
>>>>
>>>> The use case I'm experimenting with is a series of document updates
>>>> (documents being an arbitrary map of key value pairs), I would like to find
>>>> the latest document updates after a specified time period. I don't want to
>>>> store many copies of the documents (one per update) as the updates are
>>>> often only to single keys in the map so that would involve a lot of
>>>> duplicated data.
>>>>
>>>> The solution I've found that seems to fit best in terms of performance
>>>> is to have two tables.
>>>>
>>>> One that has an event log of timeuuid -> docid and a second that stores
>>>> the documents themselves stored by docid -> map<string, string>.
I then run
>>>> two queries, one to select ids that have changed after a certain time:
>>>>
>>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime)
>>>>
>>>> and then a second to select the actual documents themselves
>>>>
>>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)
>>>>
>>>> However this then explodes on query with the error message:
>>>>
>>>> "Cannot restrict PRIMARY KEY part id by IN relation as a collection is
>>>> selected by the query"
>>>>
>>>> Detective work lead me to these lines in
>>>> org.apache.cassandra.cql3.statementsSelectStatement:
>>>>
>>>>                     // We only support IN for the last name and for
>>>> compact storage so far
>>>>                     // TODO: #3885 allows us to extend to non compact
>>>> as well, but that remains to be done
>>>>                     if (i != stmt.columnRestrictions.length - 1)
>>>>                         throw new
>>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be
>>>> restricted by IN relation", cname));
>>>>                     else if (stmt.selectACollection())
>>>>                         throw new
>>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY part %s
>>>> by IN relation as a collection is selected by the query", cname));
>>>>
>>>> It seems like #3885 will allow support for the first IF block above,
>>>> but I don't think it will allow the second, am I correct?
>>>>
>>>> Any pointers on how I can work around this would be greatly appreciated.
>>>>
>>>> Kind regards,
>>>>
>>>> Dave
>>>>
>>>
>>>
>

Mime
View raw message