cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Modeling contact list, plain table or List<UDT>
Date Mon, 11 Jan 2016 21:38:02 GMT
You specify a userid and contactname to delete a single contact row in the
base table. The MV is then updated accordingly.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 4:20 PM, I PVP <ipvp@hotmail.com> wrote:

> Well…the way it is now  it is not possible to delete a specific contact
> row from the base table at all. Because a DELETE statement only works with
>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
> clausule.
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>
> The way it is now  It is only possible to delete the entire contact list
>  for that specific user.
> Looks like will need to:
> 1)SELECT all rows from user_contact excluding the one  that the user wants
> to get rid of.
> 2) DELETE all the user_contact rows  for that particular user .
> 3) INSERT  the result of 1).
>
> Is that the proper way to achieve it or Am I missing some point in the
> modeling that would allow to delete a specific contact row  and still able
> to comply with the select requirements?
>
> Thanks
> --
> IPVP
>
>
> From: Jack Krupansky <jack.krupansky@gmail.com> <jack.krupansky@gmail.com>
> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
> <user@cassandra.apache.org>
> Date: January 11, 2016 at 7:00:04 PM
>
> To: user@cassandra.apache.org <user@cassandra.apache.org>>
> <user@cassandra.apache.org>
> Subject:  Re: Modeling contact list, plain table or List
>
> That's the beauty of MV - Cassandra automatically updates the MVs when the
> base table changes, including deletions, which is why all of the PK columns
> from the base table needed to be in the MV PK.
>
> -- Jack Krupansky
>
> On Mon, Jan 11, 2016 at 3:41 PM, I PVP <ipvp@hotmail.com> wrote:
>
>> The below table and materialized view will solve the SELECT requirements
>> of my current application .
>> The challenge now is when the user decides to DELETE one specific contact
>> from his contact list. I could add the objectid to a composite partition
>> key together with the userid. But that would make the SELECT inviable.
>>
>>  Any ideas/suggestions?
>>
>>
>> CREATE TABLE communication.user_contact (
>> userid int,
>> contactname text,
>> contactid int,
>> createdat timeuuid,
>> favoriteat timestamp,
>> isfavorite boolean,
>> objectid timeuuid,
>> PRIMARY KEY (userid, contactname)
>> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>>
>>
>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
>> SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat,
>> objectid
>> FROM user_contact
>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS
>> NOT NULL
>> PRIMARY KEY ( ( userid, isfavorite ), contactname )
>> WITH CLUSTERING ORDER BY ( contactname DESC )
>>
>> Thanks
>>
>> --
>> IPVP
>>
>>
>> From: DuyHai Doan <doanduyhai@gmail.com> <doanduyhai@gmail.com>
>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>> <user@cassandra.apache.org>
>> Date: January 11, 2016 at 11:14:10 AM
>>
>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>> <user@cassandra.apache.org>
>> Subject:  Re: Modeling contact list, plain table or List
>>
>> In the current iteration of materialized view, it is still not possible
>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE
>> won't work.
>>
>> Still there is a JIRA created to support this feature :
>> https://issues.apache.org/jira/browse/CASSANDRA-10368
>>
>> About cardinality of favorite vs non-favorites, it doesn't matter in this
>> case because the OP said "Less then one hundred contacts by user is the
>> normal."
>>
>> So even if all contacts are stuck in one unique favorite state, the
>> materialized view partition for one user is at most 100. Even for extreme
>> edge case with users having 10 000 contacts, it's still a manageable
>> partition size for C*.
>>
>> But I agree it is important to know before-hand the favorite/non-favorite
>> update frequency since it will impact the write throughput on the MV.
>>
>> For more details on materialized view impl and performance:
>> http://www.doanduyhai.com/blog/?p=1930
>>
>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky <jack.krupansky@gmail.com
>> > wrote:
>>
>>> The new Materialized View feature is just an automated way of creating
>>> and maintaining what people used to call a "query table", which is the
>>> traditional Cassandra data modeling technique for performing queries on on
>>> than the primary key for a table - you store the same columns in different
>>> tables using different columns for the primary key.
>>>
>>> One also needs to be careful to include all columns of the original
>>> primary key in each MV primary key - in addition to whatever column(s) are
>>> to be used for indexing in each MV (so that Cassandra can find the old row
>>> when it needs to update the MV when the base table row changes, such as on
>>> a deletion.)
>>>
>>> But before creating MVs, you first need to answer questions about how
>>> the app needs to query the data. Even with MV, conceptualizing queries
>>> needs to precede data modeling.
>>>
>>> For example, what is the cardinality of favorites vs. non-favorites,
>>> does the app even need to query by favorates, as opposed to querying all
>>> contacts and retrieving is_favorite as simply a non-key column value,
>>> whether favorites need to be retrieved separately from non-favorites, the
>>> frequency and latency requirements for query by favorite status, etc. Once
>>> these questions are answered, decisions can be made about data modeling.
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso <info@mrcalonso.com>
>>> wrote:
>>>
>>>> I have never used Materialized Views so maybe this suggestion is not
>>>> possible, but in this case, wouldn't it make sense to define the
>>>> materialized view as
>>>>
>>>> is_favourite IS TRUE
>>>> instead of
>>>> is_favourite IS NOT NULL?
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 10 January 2016 at 09:59, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>>>
>>>>> Try this
>>>>>
>>>>> CREATE TABLE communication.user_contact_list (
>>>>>   user_id uuid,
>>>>>   contact_id uuid,
>>>>>   contact_name text,
>>>>>   created_at timeuuid,
>>>>>   is_favorite boolean,
>>>>>   favorite_at timestamp,
>>>>>   PRIMARY KEY (user_id, contact_name, contact_id)
>>>>> );
>>>>>
>>>>> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
>>>>> AS SELECT * FROM communication.user_contact_list
>>>>> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
>>>>> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
>>>>> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)
>>>>>
>>>>> If the flag is_favorite is not updated very often the write perf hit
>>>>> due to materialized view is acceptable.
>>>>>
>>>>> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. <ipvp@hotmail.com> wrote:
>>>>>
>>>>>> Jack/ Michael,
>>>>>>
>>>>>> Thanks for answering.
>>>>>>
>>>>>> How big?: Less then one hundred contacts by user is the normal.
>>>>>>
>>>>>> Update requirements: The UPDATE requirements are all around  each
>>>>>> user “favoriting/unfavoriting” the contacts . Deleting is not
very frequent.
>>>>>>
>>>>>> Does that mean that in C* 3.02 , for this use case to work, the
>>>>>> contact name  must be part of a  composite partition key in order
to allow
>>>>>> sorting by contact_name like this ? :
>>>>>>
>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>> user_id uuid,
>>>>>> contact_name text,
>>>>>> is_favorite boolean,
>>>>>> contact_id uuid,
>>>>>> created_at timeuuid,
>>>>>> favorite_at timestamp,
>>>>>> PRIMARY KEY ((user_id, contact_name), is_favorite)
>>>>>> )  WITH CLUSTERING ORDER BY (contact_name ASC);
>>>>>>
>>>>>> Query: Select * from user_contact_list where user_id = :userid and
>>>>>> is_favorite = true order by contact_name asc;
>>>>>>
>>>>>> Looks like each contact as a row/clustering key will be the way to
go.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> IPVP
>>>>>>
>>>>>>
>>>>>> From: Laing, Michael <michael.laing@nytimes.com>
>>>>>> <michael.laing@nytimes.com>
>>>>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>>>> <user@cassandra.apache.org>
>>>>>> Date: January 9, 2016 at 11:51:27 AM
>>>>>> To: user@cassandra.apache.org <user@cassandra.apache.org>>
>>>>>> <user@cassandra.apache.org>
>>>>>> Subject:  Re: Modeling contact list, plain table or List
>>>>>>
>>>>>> Note that in C* 3.02 the second query is invalid:
>>>>>>
>>>>>> cqlsh> Select * from communication.user_contact_list where user_id
=
>>>>>> 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order
>>>>>> by contact_name asc;
>>>>>>
>>>>>> *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY
>>>>>> column "is_favorite" cannot be restricted as preceding column
>>>>>> "contact_name" is not restricted"*
>>>>>>
>>>>>> On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky <
>>>>>> jack.krupansky@gmail.com> wrote:
>>>>>>
>>>>>>> How big is each contact list expected to be? Dozens? Hundreds?
>>>>>>> Thousands? If just dozens, a simple list column would seem sufficient.
If
>>>>>>> thousands, the row (not partition) would get kind of bloated.
>>>>>>>
>>>>>>> What requirements do you have for updating? If updating contacts
and
>>>>>>> lots of contacts, I think I'd prefer each contact as a row/clustering
key.
>>>>>>> Nice to be able to do selective queries to return slices of the
clustering
>>>>>>> key values, which is not so easy if they are all just a single
list column.
>>>>>>>
>>>>>>> -- Jack Krupansky
>>>>>>>
>>>>>>> On Fri, Jan 8, 2016 at 6:31 PM, Isaac P. <ipvp@hotmail.com>
wrote:
>>>>>>>
>>>>>>>> Hi everyone
>>>>>>>>
>>>>>>>> What would perform better while modeling a simple user contact
list
>>>>>>>>  that will be used mainly to select the recipients for/from/to
messages ?
>>>>>>>>
>>>>>>>> a) Individual rows to each (user, contact) pair so a select
would
>>>>>>>> fetch all the rows  to retrieve all the contacts from a given
user.
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> b) A single row for each user containing the List<Contact>
 UDT.
>>>>>>>>
>>>>>>>> Aside of the basic CRUD, the queries will be the following
ones:
>>>>>>>>
>>>>>>>> Select * from user_contact_list where user_id = :userid order
by
>>>>>>>> contact_name asc
>>>>>>>>
>>>>>>>> Select * from user_contact_list where user_id = :userid and
>>>>>>>> is_favorite = true order by contact_name asc
>>>>>>>>
>>>>>>>> After reading this
>>>>>>>> https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html
>>>>>>>>  the table is looking like this:
>>>>>>>>
>>>>>>>> CREATE TABLE communication.user_contact_list (
>>>>>>>> user_id uuid,
>>>>>>>> contact_id uuid,
>>>>>>>> contact_name text,
>>>>>>>> created_at timeuuid,
>>>>>>>> is_favorite boolean,
>>>>>>>> favorite_at timestamp,
>>>>>>>> PRIMARY KEY (user_id, contact_name, is_favorite)
>>>>>>>> );
>>>>>>>>
>>>>>>>> Any guidance will be appreciated.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> --
>>>>>>>> IPVP
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message