incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Real-time Web Analysis tool using Cassandra. Doubts...
Date Wed, 12 May 2010 14:37:11 GMT
On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati
<paulogpoiati@gmail.com> wrote:
> - First of all, my first thoughts is to have two CF one for raw client
> request (~10 millions++ per day) and other for aggregated metrics in some
> defined inteval time like 1min, 5min, 15min... Is this a good approach ?

Sure.

> - It is a good idea to use a OrderPreservingPartitioner ? To maintain the
> order of my requests in the raw data CF ? Or the overhead is too big.

The problem with OPP isn't overhead (it is lower-overhead than RP) but
the tendency to have hotspots in sequentially-written data.

> - Initially the cluster will contain only three nodes, is it a problem (to
> few maybe) ?

You'll have to do some load testing to see.

> - I think the best way to do the aggregation job is through a hadoop
> MapReduce job. Right ? Is there any other way to consider ?

Map/Reduce is usually better than rolling your own because it
parallelizes for you.

> - Is really Cassandra suitable for it ? Maybe HBase is better in this case?

Nothing here makes me think "Cassandra is a poor choice."

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message