cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peer, Oded" <Oded.P...@rsa.com>
Subject RE: Example Data Modelling
Date Tue, 07 Jul 2015 06:02:38 GMT
The data model suggested isn’t optimal for the “end of month” query you want to run since
you are not querying by partition key.
The query would look like “select EmpID, FN, LN, basic from salaries where month = 1”
which requires filtering and has unpredictable performance.

For this type of query to be fast you can use the “month” column as the partition key
and the “EmpID” and the clustering column.
This approach also has drawbacks:
1. This data model creates a wide row. Depending on the number of employees this partition
might be very large. You should limit partition sizes to 25MB
2. Distributing data according to month means that only a small number of nodes will hold
all of the salary data for a specific month which might cause hotspots on those nodes.

Choose the approach that works best for you.


From: Carlos Alonso [mailto:info@mrcalonso.com]
Sent: Monday, July 06, 2015 7:04 PM
To: user@cassandra.apache.org
Subject: Re: Example Data Modelling

Hi Srinivasa,

I think you're right, In Cassandra you should favor denormalisation when in RDBMS you find
a relationship like this.

I'd suggest a cf like this
CREATE TABLE salaries (
  EmpID varchar,
  FN varchar,
  LN varchar,
  Phone varchar,
  Address varchar,
  month integer,
  basic integer,
  flexible_allowance float,
  PRIMARY KEY(EmpID, month)
)

That way the salaries will be partitioned by EmpID and clustered by month, which I guess is
the natural sorting you want.

Hope it helps,
Cheers!

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 6 July 2015 at 13:01, Srinivasa T N <seenutn@gmail.com<mailto:seenutn@gmail.com>>
wrote:
Hi,
   I have basic doubt: I have an RDBMS with the following two tables:

   Emp - EmpID, FN, LN, Phone, Address
   Sal - Month, Empid, Basic, Flexible Allowance

   My use case is to print the Salary slip at the end of each month and the slip contains
emp name and his other details.

   Now, if I want to have the same in cassandra, I will have a single cf with emp personal
details and his salary details.  Is this the right approach?  Should we have the employee
personal details duplicated each month?

Regards,
Seenu.

Mime
View raw message