cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...
Date Tue, 02 Oct 2012 13:33:08 GMT
Well, I think I know the direction we may follow so we can
1. Have Virtual CF's
2. Be able to map/reduce ONE Virtual CF

Well, not map/reduce exactly but really really close.  We use PlayOrm with
it's partitioning so I am now thinking what we will do is have a compute
grid  where we can have each node doing a findAll query into the
partitions it is responsible for.  In this way, I think we can 1000's of
virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
the rows for that partition of one virtual CF.

Anyone know of a computer grid we can dish out work to?  That would be my
only missing piece (well, that and the PlayOrm virtual CF feature but I
can add that within a week probably though I am on vacation this Thursday
to monday).


On 10/2/12 6:35 AM, "Hiller, Dean" <> wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerÅ .and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsÅ .our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!!!!!  Ugh.
>On 10/1/12 3:38 PM, "Ben Hood" <> wrote:
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <>
>>> Its just a convenient way of prefixing:
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?

View raw message