incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: creating and dropping columnfamilies as a usecase
Date Thu, 21 Oct 2010 17:59:02 GMT
AFAIK it's not really the purpose of the dynamic schema functions. 

You may run into problems such as the caches are per CF and the CF's have a high memory overhead
(3 * mem table MB) so your memory usage will jump around.

Cloud Kick gather a lot of metrics this may help http://wiki.apache.org/cassandra/ArchitectureCommitLog

If you want to use Hadoop for the analysis, and the data really can be thrown away, then I
would consider using Hadoop by it's self. Take a look at Flume from cloudera to stream data
into HDFS http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/

Hope that helps. 
Aaron

On 22 Oct 2010, at 04:12, Utku Can Top├žu wrote:

> Hi All,
> 
> In the current project I'm working on. I have use case for hourly analyzing the rows.
> 
> Since the 0.7x branch supports creating and dropping columnfamilies on the fly; 
> My use case proposal will be:
> 
> * Create a CF at the very beginning of every hour
> * At the end of the 1-hour period, analyze the data stored in the CF with Hadoop
> * Drop the CF afterwards.
> 
> Can you foresee any problems in continiously creating and dropping columnfamilies?
> 
> Regards,
> Utku
> 
> 


Mime
View raw message