cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Alonso <i...@mrcalonso.com>
Subject Re: [Typo correction] Is it good for performance to put rows that are of different types but are always queried together in the same table partition?
Date Tue, 12 Jan 2016 20:05:58 GMT
Why can't you have something like this?

CREATE TABLE t (
  p INT,
  q1 INT,
  q2 UUID,
  c1 INT,
  c2 TEXT,
  PRIMARY KEY (p, q1, q2)
)

Sounds the simplest solution.

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 12 January 2016 at 18:27, Bamoqi <bamoqi@gmail.com> wrote:

> I over-simplified the original example. In the real model I cannot just
> merge the row types. Suppose
>     create table t1(
>         p int,
>         q1 int,
>         c1 int,
>         primary key( p, q1 )
>     )
>     create table t2(
>         p int,
>         q2 uuid,
>         c2 text,
>         primary key( p, q2 )
>     )
>
> Merging the tables will be slightly ugly and waste some storage in the
> clustering columns:
>     create table t(
>         p int,
>         rowtype tinyint, // t1 or t2
>         q1 int, q2 uuid, // depending on rowtype, either q1 or q2 is unused
>         c1 int, c2 text, // depending on rowtype, either c1 or c2 is null
>         primary key( p, rowtype, q1, q2)
>     )
>
> Nevertheless, putting them into one table seems faster as we only need one
> query to get both types, and have better cache locality. Am I correct?
>
>
> On Saturday, January 09, 2016 06:47 AM, Jack Krupansky wrote:
>
> A simple denormalization is probably all that is called for - just merge
> the two tables into one (their union.) No need for this row type.
>
>
> -- Jack Krupansky
>
> On Fri, Jan 8, 2016 at 9:30 AM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>
>> You’ll see better performance using a slice (which is effectively what
>> will happen if you put them into the same table and use query-1table-b), as
>> each node will only need to merge cells/results once. It may not be twice
>> as fast, but it’ll be fast enough to make it worthwhile.
>>
>>
>>
>> On 1/8/16, 12:13 AM, "Bamoqi" < <bamoqi@gmail.com>bamoqi@gmail.com>
>> wrote:
>>
>> >[Correction of the original message which contains typos in code.]
>> >
>> >Is it good for performance to put rows that are of different types but
>> >are always queried together in the same table partition?
>> >
>> >My consideration is that whether doing so will result in better
>> >memory/disk cache locality.
>> >
>> >Suppose I need to query for 2 different types of rows for a frequent
>> >user request, I can use 2 tables or 1 table:
>> >
>> >2 tables:
>> >
>> >   create table t1(
>> >     partitionkey int primary key,
>> >     col1 int, col2 int, ...
>> >   )
>> >   create table t2(
>> >     partitionkey int primary key,
>> >     col3 int, col4 int, ...
>> >   )
>> >
>> >query-2table:
>> >   select col1,col2 from t1 where partitionkey = ?
>> >   select col3,col4 from t2 where partitionkey = ?
>> >
>> >1 table:
>> >
>> >   create table t(
>> >     partitionkey int,
>> >     rowtype tinyint,
>> >     col1 int, col2 int, ...
>> >     col3 int, col4 int, ...
>> >     primary key( partitionkey, rowtype )
>> >   )
>> >
>> >query-1table-a:
>> >   select col1,col2 from t where partitionkey = ? and rowtype = 1
>> >   select col3,col4 from t where partitionkey = ? and rowtype = 2
>> >
>> >or alternatively, query-1table-b:
>> >   select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
>> >   // Used columns are `null`. Switch on `rowtype` in the app code
>> >
>> >Is there significant performance difference in query-2table,
>> >query-1table-a, query-1table-b?
>> >Is the cassandra client/coordinator smart enough to direct subsequent
>> >queries of the same (table, partitionkey) to the same node so they can
>> >reuse a cached page?
>> >
>> >Regards & Thanks
>>
>
>
>

Mime
View raw message