incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Stevens <migh...@gmail.com>
Subject Re: data model question : finding out the n most recent changes items
Date Thu, 11 Jul 2013 12:55:13 GMT
I think there is not an extremely simple solution to your problem.  You
will probably need to use multiple tables to get the view you need. One
keyed just by file UUID, which tracks some basic metadata about the file
including the last modified time.  Another as a materialized view of the
most recently modified files.

When a user updates the file, you'd need to read the current last_modified
time for that file and delete its value out of the most_recently_modified
table before inserting it back in, and updating the last_modified on the
files table.

This is a little bit fragile because it depends on reading then modifying
based on that result - and that's a typical antipattern for
eventually-consistent databases.

You might consider using a column on the user table using the List type
which keeps track of the most recently modified files for that user, treat
it like a queue and pop off the oldest ones each file write. This still
ends up being read-then-write, but presumably it is less prone to race
conditions because the user is not modifying many files at the same moment
in time, while many users could be modifying the same file at the same
moment.  So it still falls under the antipattern, but at least the failures
will be less likely.


On Thu, Jul 11, 2013 at 3:39 AM, Jimmy Lin <y2klyf+work@gmail.com> wrote:

> what I mean is, I really just want the last modified date instead of
> series of timestamp and still able to sort or order by it.
> (maybe I should rephrase my question as how to sort or order by last
> modified column in a row)
>
> CREATE TABLE user_file (
>         user_id uuid,
>         modified_date timestamp,
>         file_id timeuuid,
>         PRIMARY KEY(user_id, modified_date)
> );
>
> e.g user1 update file A 3 times in a row, and update file B, then update
> file A again.
> insert into user_file values(user1_uuid, date1, file_a_uuid);
> insert into user_file values(user1_uuid, date2, file_a_uuid);
> insert into user_file values(user1_uuid, date3, file_a_uuid);
> insert into user_file values(user1_uuid, date4, file_b_uuid);
> insert into user_file values(user1_uuid, date5, file_a_uuid);
>
> #trying to get top 3 most recent changed files
> select * from user_file where user_id=user1_uuid limit 3
>
> using CQL, I will get 3 rows back(all file a)
> (user1_uuid, date1, file_a_uuid);
> (user1_uuid, date2, file_a_uuid);
> (user1_uuid, date3, file_a_uuid);
>
> what I want is (file a AND file b)
> user1_uuid, date1, file_a_uuid
> user1_uuid, date4, file_b_uuid
>
> So how do I order by/sort by last modified column in a row?
>
> thanks
>
>
>
>
> On Thu, Jul 11, 2013 at 12:00 AM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> What you described this sounds like the most appropriate:
>>
>> CREATE TABLE user_file (
>>         user_id uuid,
>>         modified_date timestamp,
>>         file_id timeuuid,
>>         PRIMARY KEY(user_id, modified_date)
>> );
>>
>> If you normally need more information about  the file then either store
>> that as additional fields or pack the data using something like JSON or
>> Protobuf.
>>
>> > my return list may still not accurate because a single directory could
>> have lot of modification changes. I basically end up pulling out series of
>> modification timestamp for the same directory.
>> Not sure I understand the problem.
>>
>> Cheers
>>
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 10/07/2013, at 6:51 PM, Jimmy Lin <y2klyf+work@gmail.com> wrote:
>>
>> > I have an application that need to find out the n most recent modified
>> files for a given user id. I started out few tables but still couldn't get
>> what i want, I hope someone get point to some right direction...
>> >
>> > See my tables below.
>> >
>> > #1 won't work, because file_id's timeuuid contains creation time, not
>> the modification time.
>> >
>> > #2 won't work, because i can't order by a non primary key
>> column(modified_date)
>> > #3,#4 although i can now get  a time series of modification time of
>> each file belongs to a user, my return list may still not accurate because
>> a single directory could have lot of modification changes. I basically end
>> up pulling out series of modification timestamp for the same directory.
>> >
>> > Any suggestion?
>> >
>> > Thanks
>> >
>> >
>> >
>> > #1
>> >
>> > CREATE TABLE user_file (
>> >
>> > user_id uuid,
>> >
>> > file_id timeuuid,
>> >
>> > PRIMARY KEY(user_id, file_id)
>> >
>> > );
>> >
>> >
>> >
>> > #2
>> >
>> > CREATE TABLE user_file (
>> >
>> > user_id uuid,
>> >
>> > file_id timeuuid,
>> >
>> > modified_date timestamp,
>> >
>> > PRIMARY KEY(user_id, file_id)
>> >
>> > );
>> >
>> >
>> >
>> > #3
>> >
>> > CREATE TABLE user_file (
>> >
>> > user_id uuid,
>> >
>> > file_id timeuuid,
>> >
>> > modified_date timestamp,
>> >
>> > PRIMARY KEY(user_id, file_id, modified_date)
>> >
>> > );
>> >
>> >
>> >
>> > #4
>> >
>> > CREATE TABLE user_file (
>> >
>> > user_id uuid,
>> >
>> > modified_date timestamp,
>> >
>> > file_id timeuuid,
>> >
>> > PRIMARY KEY(user_id, modified_date, file_id)
>> >
>> > );
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Mime
View raw message