cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Francis <>
Subject Single Vs. Multiple Keyspaces
Date Wed, 18 Apr 2012 16:33:47 GMT
We are launching a data-intensive application that will store in upwards of 50 million 150-byte
records per day per user. We have identified Cassandra as our database technology and Flume
as what we will use to seed the data from log files into the database. 

Each user is given their own server instance, but the schema of the data for each user will
be the same.

We will be performing realtime analysis on this information as part of our application and
was considering the advantages/disadvantages of all users using the same keyspace. All data
will be treated the same as far as replication factor and the only difference is we won't
be displaying one user's info to another user. They will be compartmentalized and one user's
data will not affect or ever be compared against another user.

Conceptualize this as a each user has their own Apache server and that server spits out 50
million records per day and each user will only be analyzing the data for their particular
server, not anyone elses. The log formats are exactly the same.

My experience lies in relational databases and not key-value stores, like Cassandra. So, in
the mysql world we would put each user in their own database to avoid the locking contention
and to make queries faster. 

If we don't post info into different keyspaces, i assume we will have to add an additional
field to our records to identify the user that owns that particular record. How does a single
large Keyspace affect query speed, etc. etc.

Trevor Francis

View raw message