cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Modeling contact list, plain table or List<UDT>
Date Tue, 12 Jan 2016 13:33:30 GMT
--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
contactname=yyyy ?
Answer : Because a contact name can be duplicated.   Or should I force
unique contact names?

In this case, add contactid as extra clustering column to guarantee unicity
for your contact. The delete query becomes:

DELETE FROM user_contact WHERE userid=xxx AND contactname=yyyy AND
contactid=zzz

Normally, from the front-end (web apps or smartphone client) if you have
the contactname, you SURELY also have the contactid information.
Consequently, you can issue the above DELETE statement without having to
read-before-delete, am I wrong ?



On Tue, Jan 12, 2016 at 2:02 PM, I PVP <ipvp@hotmail.com> wrote:

> --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
> contactname=yyyy ?
> Answer : Because a contact name can be duplicated.   Or should I force
> unique contact names?
>
> Overall , the challenge seems to be addressed , with some trade of on the
> "ordering by contact name”.
>
> If, at the base table, the clustering column is the objectid(timeuuid)
> instead of the contactname , the DELETE will be based on userid = ? and
> objectid = ?.
> This works fine. Generic SELECTs will also work fine on the base table.
>
> The MV will serve SELECTs  targeting/filtering the favorite contacts.
>
> Like this:
>
> CREATE TABLE communication.user_contact (
> userid int,
> objectid timeuuid,
> contactid int,
> contactname text,
> createdat timeuuid,
> favoriteat timestamp,
> isfavorite boolean,
> PRIMARY KEY (userid, objectid)
> );
>
>
> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
> SELECT userid, isfavorite, objectid, contactid, contactname, createdat,
> favoriteat
> FROM user_contact
> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND objectid IS NOT
> NULL
> PRIMARY KEY ( ( userid, isfavorite ), objectid )
> WITH CLUSTERING ORDER BY ( objectid DESC ) ;
>
>
> Unfortunately  this approach forces the model to cluster by
> objected(timeuuid) just to satisfy the need to DELETE a specific contact
> row,  and by doing that it wastes a opportunity on the MV, because all the
> PKs from the base table need to be at the MV and  it is not possible to set
> the MV with with more than 1 non-PK column from the base table as the MV
> PK.  But still working fine.
>
>
> That is my first Cassandra use case and the guidance provided by  you guys
> pretty important.
>
> Thanks very much for the answers, questions and suggestions.
>
>
> --
> IPVP
>
>
> From: DuyHai Doan <doanduyhai@gmail.com> <doanduyhai@gmail.com>
> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
> <user@cassandra.apache.org>
> Date: January 12, 2016 at 10:27:45 AM
> To: user@cassandra.apache.org <user@cassandra.apache.org>>
> <user@cassandra.apache.org>
> Cc: Jack Krupansky <jack.krupansky@gmail.com>> <jack.krupansky@gmail.com>
>
> Subject:  Re: Modeling contact list, plain table or List
>
> 1)SELECT all rows from user_contact excluding the one  that the user wants
> to get rid of.
> 2) DELETE all the user_contact rows  for that particular user .
> 3) INSERT  the result of 1).
>
> --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
> contactname=yyyy ?
>
> The Materialized View will be automagically updated by Cassandra with a
> query similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND
> is_favorite=zzz AND contactname=yyy
>
> On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad <jon@jonhaddad.com>
> wrote:
>
>> In general I advise people avoid lists and use Maps or Sets instead.
>>
>> Using this data model, for instance, it's easy to remove a specific
>> Address from a user:
>>
>> CREATE TYPE address (
>>   street text,
>>   city text,
>>   zip_code int,
>> );
>>
>> CREATE TABLE user (
>>     user_id int primary key,
>>     addresses map<text, frozen <address>>
>> );
>>
>> When I want to remove one of the addresses from a user, I can do this:
>>
>> cqlsh:test> delete addresses['home'] from user where user_id =  1;
>>
>>
>> Hope that helps,
>> Jon
>>
>>
>> On Mon, Jan 11, 2016 at 1:20 PM I PVP <ipvp@hotmail.com> wrote:
>>
>>> Well…the way it is now  it is not possible to delete a specific contact
>>> row from the base table at all. Because a DELETE statement only works with
>>>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
>>> clausule.
>>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>>>
>>> The way it is now  It is only possible to delete the entire contact list
>>>  for that specific user.
>>> Looks like will need to:
>>> 1)SELECT all rows from user_contact excluding the one  that the user
>>> wants to get rid of.
>>> 2) DELETE all the user_contact rows  for that particular user .
>>> 3) INSERT  the result of 1).
>>>
>>> Is that the proper way to achieve it or Am I missing some point in the
>>> modeling that would allow to delete a specific contact row  and still able
>>> to comply with the select requirements?
>>>
>>> Thanks
>>> --
>>> IPVP
>>>
>>>
>>> From: Jack Krupansky <jack.krupansky@gmail.com>
>>> <jack.krupansky@gmail.com>
>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>>> <user@cassandra.apache.org>
>>> Date: January 11, 2016 at 7:00:04 PM
>>>
>>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>>> <user@cassandra.apache.org>
>>> Subject:  Re: Modeling contact list, plain table or List
>>>
>>> That's the beauty of MV - Cassandra automatically updates the MVs when
>>> the base table changes, including deletions, which is why all of the PK
>>> columns from the base table needed to be in the MV PK.
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Jan 11, 2016 at 3:41 PM, I PVP <ipvp@hotmail.com> wrote:
>>>
>>>> The below table and materialized view will solve the SELECT
>>>> requirements of my current application .
>>>> The challenge now is when the user decides to DELETE one specific
>>>> contact from his contact list. I could add the objectid to a composite
>>>> partition key together with the userid. But that would make the SELECT
>>>> inviable.
>>>>
>>>>  Any ideas/suggestions?
>>>>
>>>>
>>>> CREATE TABLE communication.user_contact (
>>>> userid int,
>>>> contactname text,
>>>> contactid int,
>>>> createdat timeuuid,
>>>> favoriteat timestamp,
>>>> isfavorite boolean,
>>>> objectid timeuuid,
>>>> PRIMARY KEY (userid, contactname)
>>>> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>>>>
>>>>
>>>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
>>>> SELECT userid, isfavorite, contactname, contactid, createdat,
>>>> favoriteat, objectid
>>>> FROM user_contact
>>>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS
>>>> NOT NULL
>>>> PRIMARY KEY ( ( userid, isfavorite ), contactname )
>>>> WITH CLUSTERING ORDER BY ( contactname DESC )
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> IPVP
>>>>
>>>>
>>>> From: DuyHai Doan <doanduyhai@gmail.com> <doanduyhai@gmail.com>
>>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>> <user@cassandra.apache.org>
>>>> Date: January 11, 2016 at 11:14:10 AM
>>>>
>>>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>> <user@cassandra.apache.org>
>>>> Subject:  Re: Modeling contact list, plain table or List
>>>>
>>>> In the current iteration of materialized view, it is still not possible
>>>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE
>>>> won't work.
>>>>
>>>> Still there is a JIRA created to support this feature :
>>>> https://issues.apache.org/jira/browse/CASSANDRA-10368
>>>>
>>>> About cardinality of favorite vs non-favorites, it doesn't matter in this
>>>> case because the OP said "Less then one hundred contacts by user is
>>>> the normal."
>>>>
>>>> So even if all contacts are stuck in one unique favorite state, the
>>>> materialized view partition for one user is at most 100. Even for extreme
>>>> edge case with users having 10 000 contacts, it's still a manageable
>>>> partition size for C*.
>>>>
>>>> But I agree it is important to know before-hand the
>>>> favorite/non-favorite update frequency since it will impact the write
>>>> throughput on the MV.
>>>>
>>>> For more details on materialized view impl and performance:
>>>> http://www.doanduyhai.com/blog/?p=1930
>>>>
>>>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky <
>>>> jack.krupansky@gmail.com> wrote:
>>>>
>>>>> The new Materialized View feature is just an automated way of creating
>>>>> and maintaining what people used to call a "query table", which is the
>>>>> traditional Cassandra data modeling technique for performing queries
on on
>>>>> than the primary key for a table - you store the same columns in different
>>>>> tables using different columns for the primary key.
>>>>>
>>>>> One also needs to be careful to include all columns of the original
>>>>> primary key in each MV primary key - in addition to whatever column(s)
are
>>>>> to be used for indexing in each MV (so that Cassandra can find the old
row
>>>>> when it needs to update the MV when the base table row changes, such
as on
>>>>> a deletion.)
>>>>>
>>>>> But before creating MVs, you first need to answer questions about how
>>>>> the app needs to query the data. Even with MV, conceptualizing queries
>>>>> needs to precede data modeling.
>>>>>
>>>>> For example, what is the cardinality of favorites vs. non-favorites,
>>>>> does the app even need to query by favorates, as opposed to querying
all
>>>>> contacts and retrieving is_favorite as simply a non-key column value,
>>>>> whether favorites need to be retrieved separately from non-favorites,
the
>>>>> frequency and latency requirements for query by favorite status, etc.
Once
>>>>> these questions are answered, decisions can be made about data modeling.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso <info@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> I have never used Materialized Views so maybe this suggestion is
not
>>>>>> possible, but in this case, wouldn't it make sense to define the
>>>>>> materialized view as
>>>>>>
>>>>>> is_favourite IS TRUE
>>>>>> instead of
>>>>>> is_favourite IS NOT NULL?
>>>>>>
>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>> <https://twitter.com/calonso>
>>>>>>
>>>>>> On 10 January 2016 at 09:59, DuyHai Doan <doanduyhai@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Try this
>>>>>>>
>>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>>   user_id uuid,
>>>>>>>   contact_id uuid,
>>>>>>>   contact_name text,
>>>>>>>   created_at timeuuid,
>>>>>>>   is_favorite boolean,
>>>>>>>   favorite_at timestamp,
>>>>>>>   PRIMARY KEY (user_id, contact_name, contact_id)
>>>>>>> );
>>>>>>>
>>>>>>> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
>>>>>>> AS SELECT * FROM communication.user_contact_list
>>>>>>> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
>>>>>>> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
>>>>>>> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)
>>>>>>>
>>>>>>> If the flag is_favorite is not updated very often the write perf
hit
>>>>>>> due to materialized view is acceptable.
>>>>>>>
>>>>>>> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. <ipvp@hotmail.com>
wrote:
>>>>>>>
>>>>>>>> Jack/ Michael,
>>>>>>>>
>>>>>>>> Thanks for answering.
>>>>>>>>
>>>>>>>> How big?: Less then one hundred contacts by user is the normal.
>>>>>>>>
>>>>>>>> Update requirements: The UPDATE requirements are all around
 each
>>>>>>>> user “favoriting/unfavoriting” the contacts . Deleting
is not very frequent.
>>>>>>>>
>>>>>>>> Does that mean that in C* 3.02 , for this use case to work,
the
>>>>>>>> contact name  must be part of a  composite partition key
in order to allow
>>>>>>>> sorting by contact_name like this ? :
>>>>>>>>
>>>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>>> user_id uuid,
>>>>>>>> contact_name text,
>>>>>>>> is_favorite boolean,
>>>>>>>> contact_id uuid,
>>>>>>>> created_at timeuuid,
>>>>>>>> favorite_at timestamp,
>>>>>>>> PRIMARY KEY ((user_id, contact_name), is_favorite)
>>>>>>>> )  WITH CLUSTERING ORDER BY (contact_name ASC);
>>>>>>>>
>>>>>>>> Query: Select * from user_contact_list where user_id = :userid
and
>>>>>>>> is_favorite = true order by contact_name asc;
>>>>>>>>
>>>>>>>> Looks like each contact as a row/clustering key will be the
way to
>>>>>>>> go.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> IPVP
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Laing, Michael <michael.laing@nytimes.com>
>>>>>>>> <michael.laing@nytimes.com>
>>>>>>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>>>>>> <user@cassandra.apache.org>
>>>>>>>> Date: January 9, 2016 at 11:51:27 AM
>>>>>>>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>>>>>> <user@cassandra.apache.org>
>>>>>>>> Subject:  Re: Modeling contact list, plain table or List
>>>>>>>>
>>>>>>>> Note that in C* 3.02 the second query is invalid:
>>>>>>>>
>>>>>>>> cqlsh> Select * from communication.user_contact_list where
user_id
>>>>>>>> = 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite =
true order
>>>>>>>> by contact_name asc;
>>>>>>>>
>>>>>>>> *InvalidRequest: code=2200 [Invalid query] message="PRIMARY
KEY
>>>>>>>> column "is_favorite" cannot be restricted as preceding column
>>>>>>>> "contact_name" is not restricted"*
>>>>>>>>
>>>>>>>> On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky <
>>>>>>>> jack.krupansky@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> How big is each contact list expected to be? Dozens?
Hundreds?
>>>>>>>>> Thousands? If just dozens, a simple list column would
seem sufficient. If
>>>>>>>>> thousands, the row (not partition) would get kind of
bloated.
>>>>>>>>>
>>>>>>>>> What requirements do you have for updating? If updating
contacts
>>>>>>>>> and lots of contacts, I think I'd prefer each contact
as a row/clustering
>>>>>>>>> key. Nice to be able to do selective queries to return
slices of the
>>>>>>>>> clustering key values, which is not so easy if they are
all just a single
>>>>>>>>> list column.
>>>>>>>>>
>>>>>>>>> -- Jack Krupansky
>>>>>>>>>
>>>>>>>>> On Fri, Jan 8, 2016 at 6:31 PM, Isaac P. <ipvp@hotmail.com>
wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone
>>>>>>>>>>
>>>>>>>>>> What would perform better while modeling a simple
user contact
>>>>>>>>>> list  that will be used mainly to select the recipients
for/from/to
>>>>>>>>>> messages ?
>>>>>>>>>>
>>>>>>>>>> a) Individual rows to each (user, contact) pair so
a select would
>>>>>>>>>> fetch all the rows  to retrieve all the contacts
from a given user.
>>>>>>>>>>
>>>>>>>>>> or
>>>>>>>>>>
>>>>>>>>>> b) A single row for each user containing the List<Contact>
 UDT.
>>>>>>>>>>
>>>>>>>>>> Aside of the basic CRUD, the queries will be the
following ones:
>>>>>>>>>>
>>>>>>>>>> Select * from user_contact_list where user_id = :userid
order by
>>>>>>>>>> contact_name asc
>>>>>>>>>>
>>>>>>>>>> Select * from user_contact_list where user_id = :userid
and
>>>>>>>>>> is_favorite = true order by contact_name asc
>>>>>>>>>>
>>>>>>>>>> After reading this
>>>>>>>>>> https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html
>>>>>>>>>>  the table is looking like this:
>>>>>>>>>>
>>>>>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>>>>> user_id uuid,
>>>>>>>>>> contact_id uuid,
>>>>>>>>>> contact_name text,
>>>>>>>>>> created_at timeuuid,
>>>>>>>>>> is_favorite boolean,
>>>>>>>>>> favorite_at timestamp,
>>>>>>>>>> PRIMARY KEY (user_id, contact_name, is_favorite)
>>>>>>>>>> );
>>>>>>>>>>
>>>>>>>>>> Any guidance will be appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> IPVP
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message