cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Hood <0x6e6...@gmail.com>
Subject Re: 1000's of column families
Date Tue, 02 Oct 2012 13:00:24 GMT
Dean,

On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
> Ben,
>   to address your question, read my last post but to summarize, yes, there
> is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT
> when doing map/reduce.  Doing map/reduce, you will now have HUGE overhead
> in reading a whole slew of rows you don't care about as you can't
> map/reduce a single virtual CF but must map/reduce the whole CF wasting
> TONS of resources.

That's a good point that I hadn't considered beforehand, especially as
I'd like to run MR jobs against these CFs.

Is this limitation inherent in the way that Cassandra is modelled as
input for Hadoop or could you write a custom slice query to only feed
in one particular prefix into Hadoop?

Cheers,

Ben

Mime
View raw message