cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Brian Hess (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT
Date Tue, 07 Jul 2015 14:02:05 GMT


 Brian Hess commented on CASSANDRA-9415:

Wouldn't the preparing of the statement handle that?

> Implicit use of Materialized Views on SELECT
> --------------------------------------------
>                 Key: CASSANDRA-9415
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter:  Brian Hess
>              Labels: ponies
> CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the write path
for the best-practice of "query tables".  But it does not simplify the read path as much as
our users want/need.
> We suggest to folks to create multiple copies of their base table optimized for certain
queries - hence "query table".  For example, we may have a USER table with two type of queries:
lookup by userid and lookup by email address.  We would recommend creating 2 tables USER_BY_USERID
and USER_BY_EMAIL.  Both would have the exact same schema, with the same PRIMARY KEY columns,
but different PARTITION KEY - the first would be USERID and the second would be EMAIL.
> One complicating thing with this approach is that the application now needs to know that
when it INSERT/UPDATE/DELETEs from the base table it needs to INSERT/UPDATE/DELETE from all
of the query tables as well.  CASSANDRA-6477 covers this nicely.
> However, the other side of the coin is that the application needs to know which query
table to leverage based on the selection criteria.  Using the example above, if the query
has a predicate such as "WHERE userid = 'bhess'", then USERS_BY_USERID is the better table
to use.  Similarly, when the predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL
is appropriate.
> On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" to the collection
of tables.  You do operations just on the base table.  It is very attractive for the SELECT
side as well.  It would be very good to allow an application to simply do "SELECT * FROM users
WHERE userid = 'bhess'" and have that query implicitly leverage the USERS_BY_USERID materialized
> For additional use cases, especially analytics use cases like in Spark, this allows the
Spark code to simply push down the query without having to know about all of the MVs that
have been set up.  The system will route the query appropriately.  And if additional MVs are
necessary to make a query run better/faster, then those MVs can be set up and Spark will implicitly
leverage them.

This message was sent by Atlassian JIRA

View raw message