cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Smith <>
Subject RE: General questions about Cassandra
Date Fri, 17 Feb 2012 15:35:25 GMT
Are there plans to build-in some sort of map-reduce framework into Cassandra and CQL?   It
seems that users should be able to apply a Java method to selected rows in parallel  on the
distributed Cassandra JVMs.   I believe Solandra uses such an integration.

From: Alessio Cecchi []
Sent: Friday, February 17, 2012 4:42 AM
Subject: General questions about Cassandra


we have developed a software that store logs from mail servers in MySQL,
but for huge enviroments we are developing a version that store this
data in HBase. Raw logs are, once a day, first normalized, so the output
is like this:

username,date of login, IP Address, protocol
username,date of login, IP Address, protocol
username,date of login, IP Address, protocol

and after inserted into the database.

As I was saying, for huge installation (from 1 to 10 million of logins
per day, keep for 12 months) we are working with HBase, but I would also
consider Cassandra.

The advantage of HBase is MapReduce which makes searching the logs very
fast by splitting the "query" concurrently on multiple hosts.

Query will be launched from a web interface (will be few requests per
day) and the search keys are user and time range.

But Cassandra seems less complex to manage and simply to run, so I want
to evaluate it instead of HBase.

My question is, can also Cassandra split a "query" over the cluster like
MapReduce? Reading on-line Cassandra seems fast in insert data but
slower than HBase to "query". Is it really so?

We want not install Hadoop over Cassandra.

Any suggestion is welcome :-)

Alessio Cecchi is:
@ ILS ->
on LinkedIn ->
Assistenza Sistemi GNU/Linux ->
@ PLUG ->  ex-Presidente, adesso senatore a vita,
@ LOLUG ->  Socio

View raw message