Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of muhammed.dawood@gmail.com
 designates 209.85.128.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKgmDnFf5BoGba048mDs9PGGFESNFxQKVwSEexX5jMXbABwNSg@mail.gmail.com>
References: 
 <CANCpebxeb4tiq=Geiw5b2ksLpZF7jvSmdJUCYL+258FNJu9vuQ@mail.gmail.com>
	<DCB38265-0423-48E6-A8EA-5062A3A2695C@nordsc.com>
	<CANCpebwTk=7rH5kzBQb3in+yBVP2VkYZkDSLs6Bn=NKRAP9TBQ@mail.gmail.com>
	<8A136D81-BC95-4605-B0C6-C48659876598@nordsc.com>
	<CANCpebzm-Ki=8pvB_QDvh_pweCke0Trs99fydK8kDsvxUi4HHw@mail.gmail.com>
	<CANCpebw8rhqHgKrXy2O1-vJ3VkjUH4hduRbJ5NXj_fs4DudD7Q@mail.gmail.com>
	<CANJo1uBEyrFQ4QGA-Lxs0FPVT2dZ_Yza_-5RkF-c-p5MCwzwLg@mail.gmail.com>
	<CANCpebxrGqPUXETUjK3PqMD4DaHCH4+eBi31oyBfo3yJSD6wpg@mail.gmail.com>
	<CANJo1uC_Hut7SF8fL-WnekXjBkPz5sKhLXhcwZa9niDRbg5pmQ@mail.gmail.com>
	<CANJo1uBu9m7JjDgBtN1odQr+nqDeDRhSkw9FwME-NpGgv86f0g@mail.gmail.com>
	<CAKgmDnE4mYeXpGUtVBbaAz=tv=W-eRan7TBH+yb6qqzfFoL=Bw@mail.gmail.com>
	<CAKgmDnGuhOS4aTdi+Woj2yfFYRDbE4FF7n6S1dO6snfzASkLsw@mail.gmail.com>
	<CANCpebya7xafnguD0mP2628mC4AH9v8RwjUAkwLQc0X1jTsBUA@mail.gmail.com>
	<CAKgmDnFf5BoGba048mDs9PGGFESNFxQKVwSEexX5jMXbABwNSg@mail.gmail.com>
Date: Wed, 4 Sep 2013 19:53:43 +0530
Message-ID: 
 <CANCpebxPQuX2jMizQcbXdy1ZfOCsshWWsh19bM8FvzUKD=m6MA@mail.gmail.com>
Subject: Re: Versioning in cassandra
From: dawood abdullah <muhammed.dawood@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b3439707425e904e58f8d41

--047d7b3439707425e904e58f8d41
Content-Type: text/plain; charset=ISO-8859-1

Thanks for the quick response Michael, looks like I have to go with the
solution you have given of maps, as performance is pretty critical for our
application and we do not have enough time to test. Appreciate your help.

Regards,
Dawood


On Wed, Sep 4, 2013 at 7:33 PM, Laing, Michael <michael.laing@nytimes.com>wrote:

> Dawood,
>
> In general that will work. However it does mean that you 1) read the old
> version 2) update the new version and 3) write the archive version.
>
> Step 2 is a problem: what if someone else has updated the old version
> after step 1? and there are 3 atomic operations required, at least.
>
> However, these considerations may be mitigated using Cassandra 2 light
> transactions; and it is not a problem if you have only one updater.
>
> But another problem may be performance. You must test. The solution I
> proposed does not require a read before write and does an atomic append,
> even if multiple maps are being updated. It also defers deletions via ttl's
> and a separate, manageable queue for 'cleanup' of large maps.
>
> I think the most important word in my reply is: 'test'.
>
> Cheers,
>
> Michael
>
>
> On Wed, Sep 4, 2013 at 9:05 AM, dawood abdullah <muhammed.dawood@gmail.com
> > wrote:
>
>> Michael,
>>
>> Your approach solves the problem, thanks for the solution. I was thinking
>> of another approach as well where in I would create another column family
>> say file_archive, so whenever an update is made to the File table, I will
>> create a new version in the File and move the old version to the new
>> file_archive table. Please let me know if the second approach is fine.
>>
>> Regards,
>> Dawood
>>
>>
>> On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael <michael.laing@nytimes.com
>> > wrote:
>>
>>> I use the technique described in my previous message to handle millions
>>> of messages and their versions.
>>>
>>> Actually, I use timeuuid's instead of timestamps, as they have more
>>> 'uniqueness'. Also I index my maps by a timeuuid that is the complement
>>> (based on a future date) of a current timeuuid. Since maps are kept sorted
>>> by key, this means I can just pop off the first one to get the most recent.
>>>
>>> The downside of this approach is that you get more stuff returned to you
>>> from Cassandra than you need. To mitigate that I queue a job to examine and
>>> correct the situation if, upon doing a read, the number of versions for a
>>> particular key is higher than some threshold, e.g. 50. There are many ways
>>> to approach this problem.
>>>
>>> Our actual implementation proceeds to another level, as we also have
>>> replicas of versions. This happens because we process important
>>> transactions in parallel and can expect up to 9 replicas of each version.
>>> We journal them all and use them for reporting latencies in our processing
>>> pipelines as well as for replay when we need to recover application state.
>>>
>>> Regards,
>>>
>>> Michael
>>>
>>>
>>> On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael <
>>> michael.laing@nytimes.com> wrote:
>>>
>>>> try the following. -ml
>>>>
>>>> -- put this in <file> and run using 'cqlsh -f <file>
>>>>
>>>> DROP KEYSPACE latest;
>>>>
>>>> CREATE KEYSPACE latest WITH replication = {
>>>>     'class': 'SimpleStrategy',
>>>>     'replication_factor' : 1
>>>> };
>>>>
>>>> USE latest;
>>>>
>>>> CREATE TABLE file (
>>>>     parentid text, -- row_key, same for each version
>>>>     id text, -- column_key, same for each version
>>>>     contenttype map<timestamp, text>, -- differs by version, version is
>>>> the key to the map
>>>>     PRIMARY KEY (parentid, id)
>>>> );
>>>>
>>>> update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
>>>> parentid = 'd1' and id = 'f1';
>>>> update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
>>>> parentid = 'd1' and id = 'f1';
>>>> update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
>>>> parentid = 'd1' and id = 'f2';
>>>> update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
>>>> parentid = 'd1' and id = 'f2';
>>>>
>>>> select * from file where parentid = 'd1';
>>>>
>>>> -- returns:
>>>>
>>>> -- parentid | id | contenttype
>>>>
>>>> ------------+----+--------------------------------------------------------------------------
>>>> --       d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
>>>> 00:00:00-0500': 'pdf2'}
>>>> --       d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
>>>> 00:00:00-0500': 'pdf4'}
>>>>
>>>> -- use an app to pop off the latest version from the map
>>>>
>>>> -- map other varying fields using the same technique as used for
>>>> contenttype
>>>>
>>>>
>>>>
>>>> On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra <mishra.vivs@gmail.com>wrote:
>>>>
>>>>> create table file(id text , parentid text,contenttype text,version
>>>>> timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
>>>>> ORDER BY (version DESC);
>>>>>
>>>>> insert into file (id, parentid, version, contenttype, descr, name)
>>>>> values ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
>>>>> insert into file (id, parentid, version, contenttype, descr, name)
>>>>> values ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
>>>>> insert into file (id, parentid, version, contenttype, descr, name)
>>>>> values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>>>> insert into file (id, parentid, version, contenttype, descr, name)
>>>>> values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
>>>>> create index on file(parentid);
>>>>>
>>>>>
>>>>> select * from file where id='f1' and parentid='d1' limit 1;
>>>>>
>>>>> select * from file where parentid='d1' limit 1;
>>>>>
>>>>>
>>>>> Will it work for you?
>>>>>
>>>>> -Vivek
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra <mishra.vivs@gmail.com>wrote:
>>>>>
>>>>>> My bad. I did miss out to read "latest version" part.
>>>>>>
>>>>>> -Vivek
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah <
>>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>>
>>>>>>> I have tried with both the options creating secondary index and also
>>>>>>> tried adding parentid to primary key, but I am getting all the files with
>>>>>>> parentid 'yyy', what I want is the latest version of file with the
>>>>>>> combination of parentid, fileid. Say below are the records inserted in the
>>>>>>> file table:
>>>>>>>
>>>>>>> insert into file (id, parentid, version, contenttype, description,
>>>>>>> name) values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
>>>>>>> insert into file (id, parentid, version, contenttype, description,
>>>>>>> name) values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>>>>>> insert into file (id, parentid, version, contenttype, description,
>>>>>>> name) values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>>>>>> insert into file (id, parentid, version, contenttype, description,
>>>>>>> name) values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');
>>>>>>>
>>>>>>> I want to write a query which returns me second and last record and
>>>>>>> not the first and third record, because for the first and third record
>>>>>>> there exists a latest version, for the combination of id and parentid.
>>>>>>>
>>>>>>> I am confused If at all this is achievable, please suggest.
>>>>>>>
>>>>>>> Dawood
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra <mishra.vivs@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> create secondary index over parentid.
>>>>>>>> OR
>>>>>>>> make it part of clustering key
>>>>>>>>
>>>>>>>> -Vivek
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah <
>>>>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Jan,
>>>>>>>>>
>>>>>>>>> The solution you gave works spot on, but there is one more
>>>>>>>>> requirement I forgot to mention. Following is my table structure
>>>>>>>>>
>>>>>>>>> CREATE TABLE file (
>>>>>>>>>   id text,
>>>>>>>>>   contenttype text,
>>>>>>>>>   createdby text,
>>>>>>>>>   createdtime timestamp,
>>>>>>>>>   description text,
>>>>>>>>>   name text,
>>>>>>>>>   parentid text,
>>>>>>>>>   version timestamp,
>>>>>>>>>   PRIMARY KEY (id, version)
>>>>>>>>>
>>>>>>>>> ) WITH CLUSTERING ORDER BY (version DESC);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The query (select * from file where id = 'xxx' limit 1;) provided
>>>>>>>>> solves the problem of finding the latest version file. But I have one more
>>>>>>>>> requirement of finding all the latest version files having parentid say
>>>>>>>>> 'yyy'.
>>>>>>>>>
>>>>>>>>> Please suggest how can this query be achieved.
>>>>>>>>>
>>>>>>>>> Dawood
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah <
>>>>>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> In my case version can be timestamp as well. What do you suggest
>>>>>>>>>> version number to be, do you see any problems if I keep version as counter
>>>>>>>>>> / timestamp ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen <
>>>>>>>>>> jan.algermissen@nordsc.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 02.09.2013, at 20:44, dawood abdullah <
>>>>>>>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Requirement is like I have a column family say File
>>>>>>>>>>> >
>>>>>>>>>>> > create table file(id text primary key, fname text, version
>>>>>>>>>>> int, mimetype text, content text);
>>>>>>>>>>> >
>>>>>>>>>>> > Say, I have few records inserted, when I modify an existing
>>>>>>>>>>> record (content is modified) a new version needs to be created. As I need
>>>>>>>>>>> to have provision to revert to back any old version whenever required.
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>> So, can version be a timestamp? Or does it need to be an integer?
>>>>>>>>>>>
>>>>>>>>>>> In the former case, make use of C*'s ordering like so:
>>>>>>>>>>>
>>>>>>>>>>> CREATE TABLE file (
>>>>>>>>>>>    file_id text,
>>>>>>>>>>>    version timestamp,
>>>>>>>>>>>    fname text,
>>>>>>>>>>>    ....
>>>>>>>>>>>    PRIMARY KEY (file_id,version)
>>>>>>>>>>> ) WITH CLUSTERING ORDER BY (version DESC);
>>>>>>>>>>>
>>>>>>>>>>> Get the latest file version with
>>>>>>>>>>>
>>>>>>>>>>> select * from file where file_id = 'xxx' limit 1;
>>>>>>>>>>>
>>>>>>>>>>> If it has to be an integer, use counter columns.
>>>>>>>>>>>
>>>>>>>>>>> Jan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > Regards,
>>>>>>>>>>> > Dawood
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen <
>>>>>>>>>>> jan.algermissen@nordsc.com> wrote:
>>>>>>>>>>> > Hi Dawood,
>>>>>>>>>>> >
>>>>>>>>>>> > On 02.09.2013, at 16:36, dawood abdullah <
>>>>>>>>>>> muhammed.dawood@gmail.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > > Hi
>>>>>>>>>>> > > I have a requirement of versioning to be done in Cassandra.
>>>>>>>>>>> > >
>>>>>>>>>>> > > Following is my column family definition
>>>>>>>>>>> > >
>>>>>>>>>>> > > create table file_details(id text primary key, fname text,
>>>>>>>>>>> version int, mimetype text);
>>>>>>>>>>> > >
>>>>>>>>>>> > > I have a secondary index created on fname column.
>>>>>>>>>>> > >
>>>>>>>>>>> > > Whenever I do an insert for the same 'fname', the version
>>>>>>>>>>> should be incremented. And when I retrieve a row with fname it should
>>>>>>>>>>> return me the latest version row.
>>>>>>>>>>> > >
>>>>>>>>>>> > > Is there a better way to do in Cassandra? Please suggest
>>>>>>>>>>> what approach needs to be taken.
>>>>>>>>>>> >
>>>>>>>>>>> > Can you explain more about your use case?
>>>>>>>>>>> >
>>>>>>>>>>> > If the version need not be a small number, but could be a
>>>>>>>>>>> timestamp, you could make use of C*'s ordering feature , have the database
>>>>>>>>>>> set the new version as a timestamp and retrieve the latest one with a
>>>>>>>>>>> simple LIMIT 1 query. (I'll explain more when this is an option for you).
>>>>>>>>>>> >
>>>>>>>>>>> > Jan
>>>>>>>>>>> >
>>>>>>>>>>> > P.S. Me being a REST/HTTP head, an alarm rings when I see
>>>>>>>>>>> 'version' next to 'mimetype' :-) What exactly are you versioning here?
>>>>>>>>>>> Maybe we can even change the situation from a functional POV?
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > >
>>>>>>>>>>> > > Regards,
>>>>>>>>>>> > >
>>>>>>>>>>> > > Dawood
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--047d7b3439707425e904e58f8d41
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Thanks for the quick response Michael, looks like I h=
ave to go with the solution you have given of maps, as performance is prett=
y critical for our application and we do not have enough time to test. Appr=
eciate your help.<br>
<br></div>Regards,<br>Dawood<br></div><div class=3D"gmail_extra"><br><br><d=
iv class=3D"gmail_quote">On Wed, Sep 4, 2013 at 7:33 PM, Laing, Michael <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:michael.laing@nytimes.com" target=3D"_=
blank">michael.laing@nytimes.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Dawood,<div><br></div><div>=
In general that will work. However it does mean that you 1) read the old ve=
rsion 2) update the new version and 3) write the archive version.</div>
<div><br></div><div>Step 2 is a problem: what if someone else has updated t=
he old version after step 1? and there are 3 atomic operations required, at=
 least.</div>
<div><br></div><div>However, these considerations may be mitigated using Ca=
ssandra 2 light transactions; and it is not a problem if you have only one =
updater.</div><div><br></div><div>But another problem may be performance. Y=
ou must test. The solution I proposed does not require a read before write =
and does an atomic append, even if multiple maps are being updated. It also=
 defers deletions via ttl&#39;s and a separate, manageable queue for &#39;c=
leanup&#39; of large maps.</div>

<div><br></div><div>I think the most important word in my reply is: &#39;te=
st&#39;.</div><div><br></div><div>Cheers,</div><div><br></div><div>Michael<=
/div></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extr=
a">
<br><br><div class=3D"gmail_quote">On Wed, Sep 4, 2013 at 9:05 AM, dawood a=
bdullah <span dir=3D"ltr">&lt;<a href=3D"mailto:muhammed.dawood@gmail.com" =
target=3D"_blank">muhammed.dawood@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Michael,<br><br>Y=
our approach solves the problem, thanks for the solution. I was thinking of=
 another approach as well where in I would create another column family say=
 file_archive, so whenever an update is made to the File table, I will crea=
te a new version in the File and move the old version to the new file_archi=
ve table. Please let me know if the second approach is fine.<br>


<br></div>Regards,<br></div>Dawood<br></div><div><div><div class=3D"gmail_e=
xtra"><br><br><div class=3D"gmail_quote">On Wed, Sep 4, 2013 at 2:47 AM, La=
ing, Michael <span dir=3D"ltr">&lt;<a href=3D"mailto:michael.laing@nytimes.=
com" target=3D"_blank">michael.laing@nytimes.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I use the technique describ=
ed in my previous message to handle millions of messages and their versions=
.<div>


<br></div><div>Actually, I use timeuuid&#39;s instead of timestamps, as the=
y have more &#39;uniqueness&#39;. Also I index my maps by a timeuuid that i=
s the complement (based on a future date) of a current timeuuid. Since maps=
 are kept sorted by key, this means I can just pop off the first one to get=
 the most recent.</div>


<div><br></div><div>The downside of this approach is that you get more stuf=
f returned to you from Cassandra than you need. To mitigate that I queue a =
job to examine and correct the situation if, upon doing a read, the number =
of versions for a particular key is higher than some threshold, e.g. 50. Th=
ere are many ways to approach this problem.</div>


<div><br></div><div>Our actual implementation proceeds to another level, as=
 we also have replicas of versions. This happens because we process importa=
nt transactions in parallel and can expect up to 9 replicas of each version=
. We journal them all and use them for reporting latencies in our processin=
g pipelines as well as for replay when we need to recover application state=
.</div>


<div><br></div><div>Regards,</div><div><br></div><div>Michael</div></div><d=
iv><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Tu=
e, Sep 3, 2013 at 3:15 PM, Laing, Michael <span dir=3D"ltr">&lt;<a href=3D"=
mailto:michael.laing@nytimes.com" target=3D"_blank">michael.laing@nytimes.c=
om</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">try the following. -ml<div>=
<br></div><div><div>-- put this in &lt;file&gt; and run using &#39;cqlsh -f=
 &lt;file&gt;</div>


<div><br></div><div>DROP KEYSPACE latest;</div><div><br></div><div>CREATE K=
EYSPACE latest WITH replication =3D {</div>
<div>=A0 =A0 &#39;class&#39;: &#39;SimpleStrategy&#39;,=A0</div><div>=A0 =
=A0 &#39;replication_factor&#39; : 1</div><div>};</div><div><br></div><div>=
USE latest;</div><div><br></div><div>CREATE TABLE file (</div><div>=A0 =A0 =
parentid text, -- row_key, same for each version</div>


<div>=A0 =A0 id text, -- column_key, same for each version</div><div>=A0 =
=A0 contenttype map&lt;timestamp, text&gt;, -- differs by version, version =
is the key to the map</div><div>=A0 =A0 PRIMARY KEY (parentid, id)</div><di=
v>);</div>


<div><br></div><div>update file set contenttype =3D contenttype + {&#39;201=
1-03-04&#39;:&#39;pdf1&#39;} where parentid =3D &#39;d1&#39; and id =3D =
9;f1&#39;;</div><div>update file set contenttype =3D contenttype + {&#39;20=
11-03-05&#39;:&#39;pdf2&#39;} where parentid =3D &#39;d1&#39; and id =3D &#=
39;f1&#39;;</div>


<div>update file set contenttype =3D contenttype + {&#39;2011-03-04&#39;:&#=
39;pdf3&#39;} where parentid =3D &#39;d1&#39; and id =3D &#39;f2&#39;;</div=
><div>update file set contenttype =3D contenttype + {&#39;2011-03-05&#39;:&=
#39;pdf4&#39;} where parentid =3D &#39;d1&#39; and id =3D &#39;f2&#39;;</di=
v>


<div><br></div><div>select * from file where parentid =3D &#39;d1&#39;;</di=
v><div><br></div><div>-- returns:</div><div><br></div><div>-- parentid | id=
 | contenttype</div><div>------------+----+--------------------------------=
------------------------------------------</div>


<div>-- =A0 =A0 =A0 d1 | f1 | {&#39;2011-03-04 00:00:00-0500&#39;: &#39;pdf=
1&#39;, &#39;2011-03-05 00:00:00-0500&#39;: &#39;pdf2&#39;}</div><div>-- =
=A0 =A0 =A0 d1 | f2 | {&#39;2011-03-04 00:00:00-0500&#39;: &#39;pdf3&#39;, =
&#39;2011-03-05 00:00:00-0500&#39;: &#39;pdf4&#39;}</div>


<div><br></div><div>-- use an app to pop off the latest version from the ma=
p</div><div><br></div><div>-- map other varying fields using the same techn=
ique as used for contenttype</div></div><div><br></div></div><div>
<div><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mi=
shra <span dir=3D"ltr">&lt;<a href=3D"mailto:mishra.vivs@gmail.com" target=
=3D"_blank">mishra.vivs@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">


<div dir=3D"ltr"><div>create table file(id text , parentid text,contenttype=
 text,version timestamp, descr text, name text, PRIMARY KEY(id,version) ) W=
ITH CLUSTERING ORDER BY (version DESC);</div><div><br></div><div><div>inser=
t into file (id, parentid, version, contenttype, descr, name) values (&#39;=
f2&#39;, &#39;d1&#39;, &#39;2011-03-06&#39;, &#39;pdf&#39;, &#39;f2 file=
9;, &#39;file1&#39;);</div>


<div>insert into file (id, parentid, version, contenttype, descr, name) val=
ues (&#39;f2&#39;, &#39;d1&#39;, &#39;2011-03-05&#39;, &#39;pdf&#39;, &#39;=
f2 file&#39;, &#39;file1&#39;);</div><div>insert into file (id, parentid, v=
ersion, contenttype, descr, name) values (&#39;f1&#39;, &#39;d1&#39;, &#39;=
2011-03-05&#39;, &#39;pdf&#39;, &#39;f1 file&#39;, &#39;file1&#39;);</div>


<div>insert into file (id, parentid, version, contenttype, descr, name) val=
ues (&#39;f1&#39;, &#39;d1&#39;, &#39;2011-03-04&#39;, &#39;pdf&#39;, &#39;=
f1 file&#39;, &#39;file1&#39;);</div><div>create index on file(parentid);</=
div>


</div><div><br></div><div><br></div><div><div>select * from file where id=
=3D&#39;f1&#39; and parentid=3D&#39;d1&#39; limit 1;</div></div><div><br></=
div><div><div>select * from file where parentid=3D&#39;d1&#39; limit 1;</di=
v></div>


<div><br></div><div><br></div><div>Will it work for you?</div><span><font c=
olor=3D"#888888"><div><br></div><div>-Vivek</div><div><br></div><div><br></=
div></font></span></div><div><div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Tue, Sep 3=
, 2013 at 11:29 PM, Vivek Mishra <span dir=3D"ltr">&lt;<a href=3D"mailto:mi=
shra.vivs@gmail.com" target=3D"_blank">mishra.vivs@gmail.com</a>&gt;</span>=
 wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">My bad. I did miss out to r=
ead &quot;latest version&quot; part.<span><font color=3D"#888888"><div>
<br></div><div>-Vivek</div></font></span></div><div><div><div class=3D"gmai=
l_extra"><br><br><div class=3D"gmail_quote">On Tue, Sep 3, 2013 at 11:20 PM=
, dawood abdullah <span dir=3D"ltr">&lt;<a href=3D"mailto:muhammed.dawood@g=
mail.com" target=3D"_blank">muhammed.dawood@gmail.com</a>&gt;</span> wrote:=
<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>I have tried=
 with both the options creating secondary index and also tried adding paren=
tid to primary key, but I am getting all the files with parentid &#39;yyy&#=
39;, what I want is the latest version of file with the combination of pare=
ntid, fileid. Say below are the records inserted in the file table:<br>


<br>insert into file (id, parentid, version, contenttype, description, name=
) values (&#39;f1&#39;, &#39;d1&#39;, &#39;2011-03-04&#39;, &#39;pdf&#39;, =
&#39;f1 file&#39;, &#39;file1&#39;);<br>insert into file (id, parentid, ver=
sion, contenttype, description, name) values (&#39;f1&#39;, &#39;d1&#39;, &=
#39;2011-03-05&#39;, &#39;pdf&#39;, &#39;f1 file&#39;, &#39;file1&#39;);<br=
>


insert into file (id, parentid, version, contenttype, description, name) va=
lues (&#39;f2&#39;, &#39;d1&#39;, &#39;2011-03-05&#39;, &#39;pdf&#39;, &#39=
;f1 file&#39;, &#39;file1&#39;);<br>insert into file (id, parentid, version=
, contenttype, description, name)
 values (&#39;f2&#39;, &#39;d1&#39;, &#39;2011-03-06&#39;, &#39;pdf&#39;, &=
#39;f1 file&#39;, &#39;file1&#39;);<br><br></div>I want to write a query wh=
ich returns me second and last record and not the first and third record, b=
ecause for the first and third record there exists a latest version, for th=
e combination of id and parentid.<br>


<br></div>I am confused If at all this is achievable, please suggest.<span>=
<font color=3D"#888888"><br><br></font></span></div><span><font color=3D"#8=
88888">Dawood<br><div><div><div><div><br></div>
</div></div></div></font></span></div><div><div><div class=3D"gmail_extra">=
<br><br><div class=3D"gmail_quote">On Tue, Sep 3, 2013 at 10:58 PM, Vivek M=
ishra <span dir=3D"ltr">&lt;<a href=3D"mailto:mishra.vivs@gmail.com" target=
=3D"_blank">mishra.vivs@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">create secondary index over=
 parentid.<div>OR</div><div>make it part of clustering key</div><span><font=
 color=3D"#888888"><div>


<br></div><div>-Vivek</div></font></span></div><div><div><div class=3D"gmai=
l_extra"><br><br><div class=3D"gmail_quote">
On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:muhammed.dawood@gmail.com" target=3D"_blank">muhammed.dawood@=
gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr"><div><div><div><div>Jan,<br><br></div>The solution you gav=
e works spot on, but there is one more requirement I forgot to mention. Fol=
lowing is my table structure<br><br>CREATE TABLE file (<br>=A0 id text,<br>


=A0 contenttype text,<br>
=A0 createdby text,<br>=A0 createdtime timestamp,<br>=A0 description text,<=
br>=A0 name text,<br>=A0 parentid text,<br>=A0 version timestamp,<br>=A0 PR=
IMARY KEY (id, version)<div><br>) WITH CLUSTERING ORDER BY (version DESC);<=
br>
<br><br></div></div>
The query (select * from file where id =3D &#39;xxx&#39; limit 1;) provided=
 solves the problem of finding the latest version file. But I have one more=
 requirement of finding all the latest version files having parentid say &#=
39;yyy&#39;. <br>


<br></div>Please suggest how can this query be achieved.<span><font color=
=3D"#888888"><br><br></font></span></div><span><font color=3D"#888888">Dawo=
od<br><div><div><div><br></div></div></div></font></span></div>
<div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah <span dir=3D"ltr">&lt;<a href=
=3D"mailto:muhammed.dawood@gmail.com" target=3D"_blank">muhammed.dawood@gma=
il.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">In my case version can be t=
imestamp as well. What do you suggest version number to be, do you see any =
problems if I keep version as counter / timestamp ? <br>


</div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">
On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:jan.algermissen@nordsc.com" target=3D"_blank">jan.algermissen=
@nordsc.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><br>
On 02.09.2013, at 20:44, dawood abdullah &lt;<a href=3D"mailto:muhammed.daw=
ood@gmail.com" target=3D"_blank">muhammed.dawood@gmail.com</a>&gt; wrote:<b=
r>
<br>
&gt; Requirement is like I have a column family say File<br>
&gt;<br>
&gt; create table file(id text primary key, fname text, version int, mimety=
pe text, content text);<br>
&gt;<br>
&gt; Say, I have few records inserted, when I modify an existing record (co=
ntent is modified) a new version needs to be created. As I need to have pro=
vision to revert to back any old version whenever required.<br>
&gt;<br>
<br>
</div>So, can version be a timestamp? Or does it need to be an integer?<br>
<br>
In the former case, make use of C*&#39;s ordering like so:<br>
<br>
CREATE TABLE file (<br>
=A0 =A0file_id text,<br>
=A0 =A0version timestamp,<br>
=A0 =A0fname text,<br>
=A0 =A0....<br>
=A0 =A0PRIMARY KEY (file_id,version)<br>
) WITH CLUSTERING ORDER BY (version DESC);<br>
<br>
Get the latest file version with<br>
<br>
select * from file where file_id =3D &#39;xxx&#39; limit 1;<br>
<br>
If it has to be an integer, use counter columns.<br>
<span><font color=3D"#888888"><br>
Jan<br>
</font></span><div><div><br>
<br>
&gt; Regards,<br>
&gt; Dawood<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen &lt;<a href=3D"mailto=
:jan.algermissen@nordsc.com" target=3D"_blank">jan.algermissen@nordsc.com</=
a>&gt; wrote:<br>
&gt; Hi Dawood,<br>
&gt;<br>
&gt; On 02.09.2013, at 16:36, dawood abdullah &lt;<a href=3D"mailto:muhamme=
d.dawood@gmail.com" target=3D"_blank">muhammed.dawood@gmail.com</a>&gt; wro=
te:<br>
&gt;<br>
&gt; &gt; Hi<br>
&gt; &gt; I have a requirement of versioning to be done in Cassandra.<br>
&gt; &gt;<br>
&gt; &gt; Following is my column family definition<br>
&gt; &gt;<br>
&gt; &gt; create table file_details(id text primary key, fname text, versio=
n int, mimetype text);<br>
&gt; &gt;<br>
&gt; &gt; I have a secondary index created on fname column.<br>
&gt; &gt;<br>
&gt; &gt; Whenever I do an insert for the same &#39;fname&#39;, the version=
 should be incremented. And when I retrieve a row with fname it should retu=
rn me the latest version row.<br>
&gt; &gt;<br>
&gt; &gt; Is there a better way to do in Cassandra? Please suggest what app=
roach needs to be taken.<br>
&gt;<br>
&gt; Can you explain more about your use case?<br>
&gt;<br>
&gt; If the version need not be a small number, but could be a timestamp, y=
ou could make use of C*&#39;s ordering feature , have the database set the =
new version as a timestamp and retrieve the latest one with a simple LIMIT =
1 query. (I&#39;ll explain more when this is an option for you).<br>


&gt;<br>
&gt; Jan<br>
&gt;<br>
&gt; P.S. Me being a REST/HTTP head, an alarm rings when I see &#39;version=
&#39; next to &#39;mimetype&#39; :-) What exactly are you versioning here? =
Maybe we can even change the situation from a functional POV?<br>
&gt;<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; Regards,<br>
&gt; &gt;<br>
&gt; &gt; Dawood<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt;<br>
&gt;<br>
<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b3439707425e904e58f8d41--