incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: Versioning in cassandra
Date Tue, 03 Sep 2013 19:15:25 GMT
try the following. -ml

-- put this in <file> and run using 'cqlsh -f <file>

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor' : 1
};

USE latest;

CREATE TABLE file (
    parentid text, -- row_key, same for each version
    id text, -- column_key, same for each version
    contenttype map<timestamp, text>, -- differs by version, version is the
key to the map
    PRIMARY KEY (parentid, id)
);

update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
parentid = 'd1' and id = 'f2';
update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
parentid = 'd1' and id = 'f2';

select * from file where parentid = 'd1';

-- returns:

-- parentid | id | contenttype
------------+----+--------------------------------------------------------------------------
--       d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
00:00:00-0500': 'pdf2'}
--       d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
00:00:00-0500': 'pdf4'}

-- use an app to pop off the latest version from the map

-- map other varying fields using the same technique as used for contenttype



On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra <mishra.vivs@gmail.com> wrote:

> create table file(id text , parentid text,contenttype text,version
> timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
> ORDER BY (version DESC);
>
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
> create index on file(parentid);
>
>
> select * from file where id='f1' and parentid='d1' limit 1;
>
> select * from file where parentid='d1' limit 1;
>
>
> Will it work for you?
>
> -Vivek
>
>
>
>
> On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra <mishra.vivs@gmail.com>wrote:
>
>> My bad. I did miss out to read "latest version" part.
>>
>> -Vivek
>>
>>
>> On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah <
>> muhammed.dawood@gmail.com> wrote:
>>
>>> I have tried with both the options creating secondary index and also
>>> tried adding parentid to primary key, but I am getting all the files with
>>> parentid 'yyy', what I want is the latest version of file with the
>>> combination of parentid, fileid. Say below are the records inserted in the
>>> file table:
>>>
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');
>>>
>>> I want to write a query which returns me second and last record and not
>>> the first and third record, because for the first and third record there
>>> exists a latest version, for the combination of id and parentid.
>>>
>>> I am confused If at all this is achievable, please suggest.
>>>
>>> Dawood
>>>
>>>
>>>
>>> On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra <mishra.vivs@gmail.com>wrote:
>>>
>>>> create secondary index over parentid.
>>>> OR
>>>> make it part of clustering key
>>>>
>>>> -Vivek
>>>>
>>>>
>>>> On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah <
>>>> muhammed.dawood@gmail.com> wrote:
>>>>
>>>>> Jan,
>>>>>
>>>>> The solution you gave works spot on, but there is one more requirement
>>>>> I forgot to mention. Following is my table structure
>>>>>
>>>>> CREATE TABLE file (
>>>>>   id text,
>>>>>   contenttype text,
>>>>>   createdby text,
>>>>>   createdtime timestamp,
>>>>>   description text,
>>>>>   name text,
>>>>>   parentid text,
>>>>>   version timestamp,
>>>>>   PRIMARY KEY (id, version)
>>>>>
>>>>> ) WITH CLUSTERING ORDER BY (version DESC);
>>>>>
>>>>>
>>>>> The query (select * from file where id = 'xxx' limit 1;) provided
>>>>> solves the problem of finding the latest version file. But I have one
more
>>>>> requirement of finding all the latest version files having parentid say
>>>>> 'yyy'.
>>>>>
>>>>> Please suggest how can this query be achieved.
>>>>>
>>>>> Dawood
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah <
>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>
>>>>>> In my case version can be timestamp as well. What do you suggest
>>>>>> version number to be, do you see any problems if I keep version as
counter
>>>>>> / timestamp ?
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen <
>>>>>> jan.algermissen@nordsc.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 02.09.2013, at 20:44, dawood abdullah <muhammed.dawood@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Requirement is like I have a column family say File
>>>>>>> >
>>>>>>> > create table file(id text primary key, fname text, version
int,
>>>>>>> mimetype text, content text);
>>>>>>> >
>>>>>>> > Say, I have few records inserted, when I modify an existing
record
>>>>>>> (content is modified) a new version needs to be created. As I
need to have
>>>>>>> provision to revert to back any old version whenever required.
>>>>>>> >
>>>>>>>
>>>>>>> So, can version be a timestamp? Or does it need to be an integer?
>>>>>>>
>>>>>>> In the former case, make use of C*'s ordering like so:
>>>>>>>
>>>>>>> CREATE TABLE file (
>>>>>>>    file_id text,
>>>>>>>    version timestamp,
>>>>>>>    fname text,
>>>>>>>    ....
>>>>>>>    PRIMARY KEY (file_id,version)
>>>>>>> ) WITH CLUSTERING ORDER BY (version DESC);
>>>>>>>
>>>>>>> Get the latest file version with
>>>>>>>
>>>>>>> select * from file where file_id = 'xxx' limit 1;
>>>>>>>
>>>>>>> If it has to be an integer, use counter columns.
>>>>>>>
>>>>>>> Jan
>>>>>>>
>>>>>>>
>>>>>>> > Regards,
>>>>>>> > Dawood
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen <
>>>>>>> jan.algermissen@nordsc.com> wrote:
>>>>>>> > Hi Dawood,
>>>>>>> >
>>>>>>> > On 02.09.2013, at 16:36, dawood abdullah <
>>>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>>> >
>>>>>>> > > Hi
>>>>>>> > > I have a requirement of versioning to be done in Cassandra.
>>>>>>> > >
>>>>>>> > > Following is my column family definition
>>>>>>> > >
>>>>>>> > > create table file_details(id text primary key, fname
text,
>>>>>>> version int, mimetype text);
>>>>>>> > >
>>>>>>> > > I have a secondary index created on fname column.
>>>>>>> > >
>>>>>>> > > Whenever I do an insert for the same 'fname', the version
should
>>>>>>> be incremented. And when I retrieve a row with fname it should
return me
>>>>>>> the latest version row.
>>>>>>> > >
>>>>>>> > > Is there a better way to do in Cassandra? Please suggest
what
>>>>>>> approach needs to be taken.
>>>>>>> >
>>>>>>> > Can you explain more about your use case?
>>>>>>> >
>>>>>>> > If the version need not be a small number, but could be
a
>>>>>>> timestamp, you could make use of C*'s ordering feature , have
the database
>>>>>>> set the new version as a timestamp and retrieve the latest one
with a
>>>>>>> simple LIMIT 1 query. (I'll explain more when this is an option
for you).
>>>>>>> >
>>>>>>> > Jan
>>>>>>> >
>>>>>>> > P.S. Me being a REST/HTTP head, an alarm rings when I see
>>>>>>> 'version' next to 'mimetype' :-) What exactly are you versioning
here?
>>>>>>> Maybe we can even change the situation from a functional POV?
>>>>>>> >
>>>>>>> >
>>>>>>> > >
>>>>>>> > > Regards,
>>>>>>> > >
>>>>>>> > > Dawood
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message