cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Re: 答复: 答复: A difficult data model with C*
Date Fri, 11 Nov 2016 07:24:27 GMT
This is the reason why One would like to use an mv for it. An mv Handels
this. It adds a clustering Key while preserving uniqueness of the original
pk.

Am 11.11.2016 02:33 schrieb "Gang Liu" <gangban.lau@gmail.com>:

> I guess orignal design is keep one record for one video per user. maybe
> their app will report many play records when user watching one video.
> So there will be many records when change primary key to (user_name,
> last_time). Also
> SELECT * FROM user_views WHERE user_name = ? LIMIT 10
> without group by video_id can't fit business requirement.
>
> regards,
> Gang
>
>
> On Thu, Nov 10, 2016 at 6:47 PM, Carlos Alonso <info@mrcalonso.com> wrote:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin <vladyu@winguzone.com>
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> <diamond.ben@outlook.com <diamond.ben@outlook.com>>* wrote ----
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vladyu@winguzone.com>
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST<frozen<play>>);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> <diamond.ben@outlook.com <diamond.ben@outlook.com>>* wrote ----
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>>     Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vladyu@winguzone.com>
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben
>>> <diamond.ben@outlook.com <diamond.ben@outlook.com>>* wrote ----
>>>
>>>
>>> Hi guys,
>>>
>>> We are maintaining a system for an on-line video service. ALL users'
>>> viewing records of every movie are stored in C*. So she/he can continue to
>>> enjoy the movie from the last point next time. The table is designed as
>>> below:
>>> CREATE TABLE recent (
>>> user_name text,
>>> vedio_id text,
>>> position int,
>>> last_time timestamp,
>>> PRIMARY KEY (user_name, vedio_id)
>>> )
>>>
>>> It worked well before. However, the records increase every day and the
>>> last ten items may be adequate for the business. The current model use
>>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>>> business prefer to order by the last_time desc. If we use last_time as
>>> cluster key, there will be many records for a singe movie and the recent
>>> one is actually desired. So how to model that? Do you have any suggestions?
>>> Thanks!
>>>
>>>
>>> BRs,
>>> BEN
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message