cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: 1000's of column families
Date Tue, 02 Oct 2012 16:06:38 GMT
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary
index as an input to your mapreduce job.  What you might do is add a field to the column family
that represents which virtual column family that it is part of.  Then when doing mapreduce
jobs, you could use that field as the secondary index limiter.  Secondary index mapreduce
is not as efficient since you first get all of the keys and then do multigets to get the data
that you need for the mapreduce job.  However, it's another option for not scanning the whole
column family.

On Oct 2, 2012, at 10:09 AM, Ben Hood <> wrote:

> On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill <> wrote:
>> Exactly.
> So you're back to the deliberation between using multiple CFs
> (potentially with some known working upper bound*) or feeding your map
> reduce in some other way (as you decided to do with Storm). In my
> particular scenario I'd like to be able to do a combination of some
> batch processing on top of less frequently changing data (hence why I
> was looking at Hadoop) and some real time analytics.
> Cheers,
> Ben
> (*) Not sure whether this applies to an individual keyspace or an
> entire cluster.

View raw message