cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Modeling contact list, plain table or List<UDT>
Date Tue, 12 Jan 2016 12:11:08 GMT
1)SELECT all rows from user_contact excluding the one  that the user wants
to get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
contactname=yyyy ?

The Materialized View will be automagically updated by Cassandra with a
query similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND
is_favorite=zzz AND contactname=yyy

On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> In general I advise people avoid lists and use Maps or Sets instead.
>
> Using this data model, for instance, it's easy to remove a specific
> Address from a user:
>
> CREATE TYPE address (
>   street text,
>   city text,
>   zip_code int,
> );
>
> CREATE TABLE user (
>     user_id int primary key,
>     addresses map<text, frozen <address>>
> );
>
> When I want to remove one of the addresses from a user, I can do this:
>
> cqlsh:test> delete addresses['home'] from user where user_id =  1;
>
>
> Hope that helps,
> Jon
>
>
> On Mon, Jan 11, 2016 at 1:20 PM I PVP <ipvp@hotmail.com> wrote:
>
>> Well…the way it is now  it is not possible to delete a specific contact
>> row from the base table at all. Because a DELETE statement only works with
>>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
>> clausule.
>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>>
>> The way it is now  It is only possible to delete the entire contact list
>>  for that specific user.
>> Looks like will need to:
>> 1)SELECT all rows from user_contact excluding the one  that the user
>> wants to get rid of.
>> 2) DELETE all the user_contact rows  for that particular user .
>> 3) INSERT  the result of 1).
>>
>> Is that the proper way to achieve it or Am I missing some point in the
>> modeling that would allow to delete a specific contact row  and still able
>> to comply with the select requirements?
>>
>> Thanks
>> --
>> IPVP
>>
>>
>> From: Jack Krupansky <jack.krupansky@gmail.com>
>> <jack.krupansky@gmail.com>
>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>> <user@cassandra.apache.org>
>> Date: January 11, 2016 at 7:00:04 PM
>>
>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>> <user@cassandra.apache.org>
>> Subject:  Re: Modeling contact list, plain table or List
>>
>> That's the beauty of MV - Cassandra automatically updates the MVs when
>> the base table changes, including deletions, which is why all of the PK
>> columns from the base table needed to be in the MV PK.
>>
>> -- Jack Krupansky
>>
>> On Mon, Jan 11, 2016 at 3:41 PM, I PVP <ipvp@hotmail.com> wrote:
>>
>>> The below table and materialized view will solve the SELECT requirements
>>> of my current application .
>>> The challenge now is when the user decides to DELETE one specific
>>> contact from his contact list. I could add the objectid to a composite
>>> partition key together with the userid. But that would make the SELECT
>>> inviable.
>>>
>>>  Any ideas/suggestions?
>>>
>>>
>>> CREATE TABLE communication.user_contact (
>>> userid int,
>>> contactname text,
>>> contactid int,
>>> createdat timeuuid,
>>> favoriteat timestamp,
>>> isfavorite boolean,
>>> objectid timeuuid,
>>> PRIMARY KEY (userid, contactname)
>>> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>>>
>>>
>>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
>>> SELECT userid, isfavorite, contactname, contactid, createdat,
>>> favoriteat, objectid
>>> FROM user_contact
>>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS
>>> NOT NULL
>>> PRIMARY KEY ( ( userid, isfavorite ), contactname )
>>> WITH CLUSTERING ORDER BY ( contactname DESC )
>>>
>>> Thanks
>>>
>>> --
>>> IPVP
>>>
>>>
>>> From: DuyHai Doan <doanduyhai@gmail.com> <doanduyhai@gmail.com>
>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>>> <user@cassandra.apache.org>
>>> Date: January 11, 2016 at 11:14:10 AM
>>>
>>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>>> <user@cassandra.apache.org>
>>> Subject:  Re: Modeling contact list, plain table or List
>>>
>>> In the current iteration of materialized view, it is still not possible
>>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE
>>> won't work.
>>>
>>> Still there is a JIRA created to support this feature :
>>> https://issues.apache.org/jira/browse/CASSANDRA-10368
>>>
>>> About cardinality of favorite vs non-favorites, it doesn't matter in this
>>> case because the OP said "Less then one hundred contacts by user is the
>>> normal."
>>>
>>> So even if all contacts are stuck in one unique favorite state, the
>>> materialized view partition for one user is at most 100. Even for extreme
>>> edge case with users having 10 000 contacts, it's still a manageable
>>> partition size for C*.
>>>
>>> But I agree it is important to know before-hand the
>>> favorite/non-favorite update frequency since it will impact the write
>>> throughput on the MV.
>>>
>>> For more details on materialized view impl and performance:
>>> http://www.doanduyhai.com/blog/?p=1930
>>>
>>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky <
>>> jack.krupansky@gmail.com> wrote:
>>>
>>>> The new Materialized View feature is just an automated way of creating
>>>> and maintaining what people used to call a "query table", which is the
>>>> traditional Cassandra data modeling technique for performing queries on on
>>>> than the primary key for a table - you store the same columns in different
>>>> tables using different columns for the primary key.
>>>>
>>>> One also needs to be careful to include all columns of the original
>>>> primary key in each MV primary key - in addition to whatever column(s) are
>>>> to be used for indexing in each MV (so that Cassandra can find the old row
>>>> when it needs to update the MV when the base table row changes, such as on
>>>> a deletion.)
>>>>
>>>> But before creating MVs, you first need to answer questions about how
>>>> the app needs to query the data. Even with MV, conceptualizing queries
>>>> needs to precede data modeling.
>>>>
>>>> For example, what is the cardinality of favorites vs. non-favorites,
>>>> does the app even need to query by favorates, as opposed to querying all
>>>> contacts and retrieving is_favorite as simply a non-key column value,
>>>> whether favorites need to be retrieved separately from non-favorites, the
>>>> frequency and latency requirements for query by favorite status, etc. Once
>>>> these questions are answered, decisions can be made about data modeling.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso <info@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> I have never used Materialized Views so maybe this suggestion is not
>>>>> possible, but in this case, wouldn't it make sense to define the
>>>>> materialized view as
>>>>>
>>>>> is_favourite IS TRUE
>>>>> instead of
>>>>> is_favourite IS NOT NULL?
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 10 January 2016 at 09:59, DuyHai Doan <doanduyhai@gmail.com>
wrote:
>>>>>
>>>>>> Try this
>>>>>>
>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>   user_id uuid,
>>>>>>   contact_id uuid,
>>>>>>   contact_name text,
>>>>>>   created_at timeuuid,
>>>>>>   is_favorite boolean,
>>>>>>   favorite_at timestamp,
>>>>>>   PRIMARY KEY (user_id, contact_name, contact_id)
>>>>>> );
>>>>>>
>>>>>> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
>>>>>> AS SELECT * FROM communication.user_contact_list
>>>>>> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
>>>>>> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
>>>>>> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)
>>>>>>
>>>>>> If the flag is_favorite is not updated very often the write perf
hit
>>>>>> due to materialized view is acceptable.
>>>>>>
>>>>>> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. <ipvp@hotmail.com>
wrote:
>>>>>>
>>>>>>> Jack/ Michael,
>>>>>>>
>>>>>>> Thanks for answering.
>>>>>>>
>>>>>>> How big?: Less then one hundred contacts by user is the normal.
>>>>>>>
>>>>>>> Update requirements: The UPDATE requirements are all around 
each
>>>>>>> user “favoriting/unfavoriting” the contacts . Deleting is
not very frequent.
>>>>>>>
>>>>>>> Does that mean that in C* 3.02 , for this use case to work, the
>>>>>>> contact name  must be part of a  composite partition key in order
to allow
>>>>>>> sorting by contact_name like this ? :
>>>>>>>
>>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>> user_id uuid,
>>>>>>> contact_name text,
>>>>>>> is_favorite boolean,
>>>>>>> contact_id uuid,
>>>>>>> created_at timeuuid,
>>>>>>> favorite_at timestamp,
>>>>>>> PRIMARY KEY ((user_id, contact_name), is_favorite)
>>>>>>> )  WITH CLUSTERING ORDER BY (contact_name ASC);
>>>>>>>
>>>>>>> Query: Select * from user_contact_list where user_id = :userid
and
>>>>>>> is_favorite = true order by contact_name asc;
>>>>>>>
>>>>>>> Looks like each contact as a row/clustering key will be the way
to
>>>>>>> go.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> IPVP
>>>>>>>
>>>>>>>
>>>>>>> From: Laing, Michael <michael.laing@nytimes.com>
>>>>>>> <michael.laing@nytimes.com>
>>>>>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>>>>> <user@cassandra.apache.org>
>>>>>>> Date: January 9, 2016 at 11:51:27 AM
>>>>>>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>>>>> <user@cassandra.apache.org>
>>>>>>> Subject:  Re: Modeling contact list, plain table or List
>>>>>>>
>>>>>>> Note that in C* 3.02 the second query is invalid:
>>>>>>>
>>>>>>> cqlsh> Select * from communication.user_contact_list where
user_id =
>>>>>>> 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order
>>>>>>> by contact_name asc;
>>>>>>>
>>>>>>> *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY
>>>>>>> column "is_favorite" cannot be restricted as preceding column
>>>>>>> "contact_name" is not restricted"*
>>>>>>>
>>>>>>> On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky <
>>>>>>> jack.krupansky@gmail.com> wrote:
>>>>>>>
>>>>>>>> How big is each contact list expected to be? Dozens? Hundreds?
>>>>>>>> Thousands? If just dozens, a simple list column would seem
sufficient. If
>>>>>>>> thousands, the row (not partition) would get kind of bloated.
>>>>>>>>
>>>>>>>> What requirements do you have for updating? If updating contacts
>>>>>>>> and lots of contacts, I think I'd prefer each contact as
a row/clustering
>>>>>>>> key. Nice to be able to do selective queries to return slices
of the
>>>>>>>> clustering key values, which is not so easy if they are all
just a single
>>>>>>>> list column.
>>>>>>>>
>>>>>>>> -- Jack Krupansky
>>>>>>>>
>>>>>>>> On Fri, Jan 8, 2016 at 6:31 PM, Isaac P. <ipvp@hotmail.com>
wrote:
>>>>>>>>
>>>>>>>>> Hi everyone
>>>>>>>>>
>>>>>>>>> What would perform better while modeling a simple user
contact
>>>>>>>>> list  that will be used mainly to select the recipients
for/from/to
>>>>>>>>> messages ?
>>>>>>>>>
>>>>>>>>> a) Individual rows to each (user, contact) pair so a
select would
>>>>>>>>> fetch all the rows  to retrieve all the contacts from
a given user.
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> b) A single row for each user containing the List<Contact>
 UDT.
>>>>>>>>>
>>>>>>>>> Aside of the basic CRUD, the queries will be the following
ones:
>>>>>>>>>
>>>>>>>>> Select * from user_contact_list where user_id = :userid
order by
>>>>>>>>> contact_name asc
>>>>>>>>>
>>>>>>>>> Select * from user_contact_list where user_id = :userid
and
>>>>>>>>> is_favorite = true order by contact_name asc
>>>>>>>>>
>>>>>>>>> After reading this
>>>>>>>>> https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html
>>>>>>>>>  the table is looking like this:
>>>>>>>>>
>>>>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>>>> user_id uuid,
>>>>>>>>> contact_id uuid,
>>>>>>>>> contact_name text,
>>>>>>>>> created_at timeuuid,
>>>>>>>>> is_favorite boolean,
>>>>>>>>> favorite_at timestamp,
>>>>>>>>> PRIMARY KEY (user_id, contact_name, is_favorite)
>>>>>>>>> );
>>>>>>>>>
>>>>>>>>> Any guidance will be appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> IPVP
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message