cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ivanov <>
Subject Correct data model for Cassandra
Date Wed, 28 Apr 2010 14:32:52 GMT

our company has a huge table in a relational database which keeps statistics
of some financional operations.
It looks like the following:
SERVER_ID - server, which served the transaction
ACCOUNT_FROM - account1
ACCOUNT_TO - account2
HOUR - time range for this statistics row (from 0 minutes to 59 minutes of
the hour)
SERVICE_ID - a service
GATEWAY_ID - a gateway
TRANSACTIONS_COUNT - transactions count per the hour
AMOUNT -  $ per the hour

There are different reports, which use data from the table - examples are:
show the amount per account, show the transactions count per server and so
on.. Only TRANSACTIONS_COUNT and AMOUNT columns is actually "data columns",
others act like a "filter" to group the results.
The table is huge and some queries run more than 2 minutes - this is
unacceptable, we need almost real time responses. Moreover, the cost of
database servers' support grows and grows. So there is an idea to move
statistical data of this kind to the Cassandra.

The question is - what is the correct data model for the Cassandra?

I'm thinking of creating a SuperColumn consisted of all columns described
above and map an auto generated key (ROW_ID) to it. Then create all indexes
like SERVER_ID -> ROW_ID, ACCOUNT_FROM -> ROW_ID and so on. But it looks a
little bit weird.

How to do things right? )

Thanks in advance,

View raw message