cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bamoqi <bam...@gmail.com>
Subject Re: [Typo correction] Is it good for performance to put rows that are of different types but are always queried together in the same table partition?
Date Tue, 12 Jan 2016 18:27:27 GMT
I over-simplified the original example. In the real model I cannot just 
merge the row types. Suppose
     create table t1(
         p int,
         q1 int,
         c1 int,
         primary key( p, q1 )
     )
     create table t2(
         p int,
         q2 uuid,
         c2 text,
         primary key( p, q2 )
     )

Merging the tables will be slightly ugly and waste some storage in the 
clustering columns:
     create table t(
         p int,
         rowtype tinyint, // t1 or t2
         q1 int, q2 uuid, // depending on rowtype, either q1 or q2 is unused
         c1 int, c2 text, // depending on rowtype, either c1 or c2 is null
         primary key( p, rowtype, q1, q2)
     )

Nevertheless, putting them into one table seems faster as we only need 
one query to get both types, and have better cache locality. Am I correct?


On Saturday, January 09, 2016 06:47 AM, Jack Krupansky wrote:
> A simple denormalization is probably all that is called for - just 
> merge the two tables into one (their union.) No need for this row type.
>
>
> -- Jack Krupansky
>
> On Fri, Jan 8, 2016 at 9:30 AM, Jeff Jirsa <jeff.jirsa@crowdstrike.com 
> <mailto:jeff.jirsa@crowdstrike.com>> wrote:
>
>     You’ll see better performance using a slice (which is effectively
>     what will happen if you put them into the same table and use
>     query-1table-b), as each node will only need to merge
>     cells/results once. It may not be twice as fast, but it’ll be fast
>     enough to make it worthwhile.
>
>
>
>     On 1/8/16, 12:13 AM, "Bamoqi" <bamoqi@gmail.com
>     <mailto:bamoqi@gmail.com>> wrote:
>
>     >[Correction of the original message which contains typos in code.]
>     >
>     >Is it good for performance to put rows that are of different
>     types but
>     >are always queried together in the same table partition?
>     >
>     >My consideration is that whether doing so will result in better
>     >memory/disk cache locality.
>     >
>     >Suppose I need to query for 2 different types of rows for a frequent
>     >user request, I can use 2 tables or 1 table:
>     >
>     >2 tables:
>     >
>     >   create table t1(
>     >     partitionkey int primary key,
>     >     col1 int, col2 int, ...
>     >   )
>     >   create table t2(
>     >     partitionkey int primary key,
>     >     col3 int, col4 int, ...
>     >   )
>     >
>     >query-2table:
>     >   select col1,col2 from t1 where partitionkey = ?
>     >   select col3,col4 from t2 where partitionkey = ?
>     >
>     >1 table:
>     >
>     >   create table t(
>     >     partitionkey int,
>     >     rowtype tinyint,
>     >     col1 int, col2 int, ...
>     >     col3 int, col4 int, ...
>     >     primary key( partitionkey, rowtype )
>     >   )
>     >
>     >query-1table-a:
>     >   select col1,col2 from t where partitionkey = ? and rowtype = 1
>     >   select col3,col4 from t where partitionkey = ? and rowtype = 2
>     >
>     >or alternatively, query-1table-b:
>     >   select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
>     >   // Used columns are `null`. Switch on `rowtype` in the app code
>     >
>     >Is there significant performance difference in query-2table,
>     >query-1table-a, query-1table-b?
>     >Is the cassandra client/coordinator smart enough to direct subsequent
>     >queries of the same (table, partitionkey) to the same node so
>     they can
>     >reuse a cached page?
>     >
>     >Regards & Thanks
>
>


Mime
View raw message