The extra parentheses are used to indicate that the three columns
constitute the “partition key” – otherwise only the first column of the primary
key would be the partition key. The partition key indicates which data rows will
be stored contiguously on a single node of the cluster. As written, each of your
rows might or might not get distributed to different nodes – each of your rows
will have a distinct partition key. With Jens’ approach all rows with the same
message_source_id would be part of the same partition (with the same partition
key) and stored contiguously on the same node. Since you only have 30,000 rows,
it probably doesn’t matter which way you go – organize your data based on how it
is logically structured and how you wish to access it.
Sent: Tuesday, July 1, 2014 8:24 AM
Subject: Re: Primary key question
thanks for the tip, but I never need to query the traffic_data_types and
integration_periods for a single message_source, so I will keep the double
bracket notation then for now.