cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Gerken <>
Subject Re: General questions about Cassandra
Date Fri, 17 Feb 2012 16:07:08 GMT

That's a good idea, but you have to be careful not to preclude the use of dynamic column families
(e.g. CF's with time series-like schemas) which is what Cassandra's best at.  The right approach
is to build your own "ORM"/persistence layer (or generate one with some tools) that can hide
the API differences between static and dynamic CF's.  Once you're there, hadoop and Pig both
come very close to what you're asking for.

In other words, you should be asking for a means to apply a Java method to selected objects
(not rows) that are persisted in a Cassandra column family.


- Chris
Chris Gerken

On Feb 17, 2012, at 9:35 AM, Don Smith wrote:

> Are there plans to build-in some sort of map-reduce framework into Cassandra and CQL?
  It seems that users should be able to apply a Java method to selected rows in parallel 
on the distributed Cassandra JVMs.   I believe Solandra uses such an integration.
> Don
> ________________________________________
> From: Alessio Cecchi []
> Sent: Friday, February 17, 2012 4:42 AM
> To:
> Subject: General questions about Cassandra
> Hi,
> we have developed a software that store logs from mail servers in MySQL,
> but for huge enviroments we are developing a version that store this
> data in HBase. Raw logs are, once a day, first normalized, so the output
> is like this:
> username,date of login, IP Address, protocol
> username,date of login, IP Address, protocol
> username,date of login, IP Address, protocol
> [...]
> and after inserted into the database.
> As I was saying, for huge installation (from 1 to 10 million of logins
> per day, keep for 12 months) we are working with HBase, but I would also
> consider Cassandra.
> The advantage of HBase is MapReduce which makes searching the logs very
> fast by splitting the "query" concurrently on multiple hosts.
> Query will be launched from a web interface (will be few requests per
> day) and the search keys are user and time range.
> But Cassandra seems less complex to manage and simply to run, so I want
> to evaluate it instead of HBase.
> My question is, can also Cassandra split a "query" over the cluster like
> MapReduce? Reading on-line Cassandra seems fast in insert data but
> slower than HBase to "query". Is it really so?
> We want not install Hadoop over Cassandra.
> Any suggestion is welcome :-)
> --
> Alessio Cecchi is:
> @ ILS ->
> on LinkedIn ->
> Assistenza Sistemi GNU/Linux ->
> @ PLUG ->  ex-Presidente, adesso senatore a vita,
> @ LOLUG ->  Socio

View raw message