cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Gabriel Poiati <>
Subject Real-time Web Analysis tool using Cassandra. Doubts...
Date Tue, 11 May 2010 18:52:53 GMT
Hi all.

I thinking about implementing a real-time WA tool using Cassandra as my
storage. But i have some questions first.

I'm considering Cassandra because of its excellent write performance,
horizontal scalability and its tunable consistency level.

- First of all, my first thoughts is to have two CF one for raw client
request (~10 millions++ per day) and other for aggregated metrics in some
defined inteval time like 1min, 5min, 15min... Is this a good approach ?

- It is a good idea to use a OrderPreservingPartitioner ? To maintain the
order of my requests in the raw data CF ? Or the overhead is too big.

- Initially the cluster will contain only three nodes, is it a problem (to
few maybe) ?

- I think the best way to do the aggregation job is through a hadoop
MapReduce job. Right ? Is there any other way to consider ?

- Is really Cassandra suitable for it ? Maybe HBase is better in this case?

Any other fact that u guys want to make me aware of, plz do it.

Paulo Poiati.

View raw message