incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Lin <y2klyf+w...@gmail.com>
Subject Re: data model question : finding out the n most recent changes items
Date Thu, 11 Jul 2013 08:10:36 GMT
Thanks for the suggestion.
I don't care the history of the update time to a file, BUT I do want to
ordered by it.

Reason for that is, without that, and if I have 10k+ file belongs to a
user, I have to fetch all the last modified time of all these 10k+ file and
sort through them in my application and only return the top N. Kind of
expensive.

I would like to see if it is possible to rely on Cassandra native storage
to achieve this.


CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,

last_modified_time timestamp,

PRIMARY KEY(user_id, file_id)
);


select * from user_file where user_id=user1_uuid order by
last_modified_time limit 10

Above CQL would be invalid, because last_modified_time is not part of the
compound key, and is not allowed to used for order by purpose.




On Thu, Jul 11, 2013 at 12:51 AM, Lohith Samaga M <Lohith.Samaga@mphasis.com
> wrote:

> **
>
> Hi,
> Do you need to store the history of updates to a file?
> If this is not required, then you can make the userid and file id as the
> row key. You need to simply update the modified_date timestamp. There will
> be only one row per file per user.
>
> Thanks and Regards
> M. Lohith Samaga
>
>
>
>
>
>
> -----Original Message-----
> From: y2klyf@gmail.com on behalf of Jimmy Lin
> Sent: Thu 11-Jul-13 13:09
> To: user@cassandra.apache.org
> Subject: Re: data model question : finding out the n most recent changes
> items
>
> what I mean is, I really just want the last modified date instead of series
> of timestamp and still able to sort or order by it.
> (maybe I should rephrase my question as how to sort or order by last
> modified column in a row)
>
> CREATE TABLE user_file (
>         user_id uuid,
>         modified_date timestamp,
>         file_id timeuuid,
>         PRIMARY KEY(user_id, modified_date)
> );
>
> e.g user1 update file A 3 times in a row, and update file B, then update
> file A again.
> insert into user_file values(user1_uuid, date1, file_a_uuid);
> insert into user_file values(user1_uuid, date2, file_a_uuid);
> insert into user_file values(user1_uuid, date3, file_a_uuid);
> insert into user_file values(user1_uuid, date4, file_b_uuid);
> insert into user_file values(user1_uuid, date5, file_a_uuid);
>
> #trying to get top 3 most recent changed files
> select * from user_file where user_id=user1_uuid limit 3
>
> using CQL, I will get 3 rows back(all file a)
> (user1_uuid, date1, file_a_uuid);
> (user1_uuid, date2, file_a_uuid);
> (user1_uuid, date3, file_a_uuid);
>
> what I want is (file a AND file b)
> user1_uuid, date1, file_a_uuid
> user1_uuid, date4, file_b_uuid
>
> So how do I order by/sort by last modified column in a row?
>
> thanks
>
>
>
>
> On Thu, Jul 11, 2013 at 12:00 AM, aaron morton <aaron@thelastpickle.com
> >wrote:
>
> > What you described this sounds like the most appropriate:
> >
> > CREATE TABLE user_file (
> >         user_id uuid,
> >         modified_date timestamp,
> >         file_id timeuuid,
> >         PRIMARY KEY(user_id, modified_date)
> > );
> >
> > If you normally need more information about  the file then either store
> > that as additional fields or pack the data using something like JSON or
> > Protobuf.
> >
> > > my return list may still not accurate because a single directory could
> > have lot of modification changes. I basically end up pulling out series
> of
> > modification timestamp for the same directory.
> > Not sure I understand the problem.
> >
> > Cheers
> >
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> >
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 10/07/2013, at 6:51 PM, Jimmy Lin <y2klyf+work@gmail.com> wrote:
> >
> > > I have an application that need to find out the n most recent modified
> > files for a given user id. I started out few tables but still couldn't
> get
> > what i want, I hope someone get point to some right direction...
> > >
> > > See my tables below.
> > >
> > > #1 won't work, because file_id's timeuuid contains creation time, not
> > the modification time.
> > >
> > > #2 won't work, because i can't order by a non primary key
> > column(modified_date)
> > > #3,#4 although i can now get  a time series of modification time of
> each
> > file belongs to a user, my return list may still not accurate because a
> > single directory could have lot of modification changes. I basically end
> up
> > pulling out series of modification timestamp for the same directory.
> > >
> > > Any suggestion?
> > >
> > > Thanks
> > >
> > >
> > >
> > > #1
> > >
> > > CREATE TABLE user_file (
> > >
> > > user_id uuid,
> > >
> > > file_id timeuuid,
> > >
> > > PRIMARY KEY(user_id, file_id)
> > >
> > > );
> > >
> > >
> > >
> > > #2
> > >
> > > CREATE TABLE user_file (
> > >
> > > user_id uuid,
> > >
> > > file_id timeuuid,
> > >
> > > modified_date timestamp,
> > >
> > > PRIMARY KEY(user_id, file_id)
> > >
> > > );
> > >
> > >
> > >
> > > #3
> > >
> > > CREATE TABLE user_file (
> > >
> > > user_id uuid,
> > >
> > > file_id timeuuid,
> > >
> > > modified_date timestamp,
> > >
> > > PRIMARY KEY(user_id, file_id, modified_date)
> > >
> > > );
> > >
> > >
> > >
> > > #4
> > >
> > > CREATE TABLE user_file (
> > >
> > > user_id uuid,
> > >
> > > modified_date timestamp,
> > >
> > > file_id timeuuid,
> > >
> > > PRIMARY KEY(user_id, modified_date, file_id)
> > >
> > > );
> > >
> > >
> > >
> > >
> > >
> >
> >
>
>
> Information transmitted by this e-mail is proprietary to MphasiS, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and may
> contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the intended
> recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmaster@mphasis.com and delete this mail from your records.
>

Mime
View raw message