cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <boneil...@gmail.com>
Subject Re: 1000's of column families
Date Tue, 02 Oct 2012 13:32:14 GMT

Agreed. 

Do we know yet what the overhead is for each column family?  What is the
limit?
If you have a SINGLE keyspace w/ 20000+ CF's, what happens?  Anyone know?

-brian


---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:28 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>Thanks for the idea but…(but please keep thinking on it)...
>
>100% what we don't want since partitioned data resides on the same node.
>I want to map/reduce the column families and leverage the parallel disks
>
>:( :(
>
>I am sure others would want to do the same…..We almost need a feature of
>virtual Column Families and column family should really not be column
>family but should be called ReplicationGroup or something where
>replication is configured for all CF's in that group.
>
>ANYONE have any other ideas???
>
>Dean
>
>On 10/2/12 7:20 AM, "Brian O'Neill" <boneill42@gmail.com> wrote:
>
>>
>>Without putting too much thought into it...
>>
>>Given the underlying architecture, I think you could/would have to write
>>your own partitioner, which would partition based on the prefix/virtual
>>keyspace.  
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>> 
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 10/2/12 9:00 AM, "Ben Hood" <0x6e6562@gmail.com> wrote:
>>
>>>Dean,
>>>
>>>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean <Dean.Hiller@nrel.gov>
>>>wrote:
>>>> Ben,
>>>>   to address your question, read my last post but to summarize, yes,
>>>>there
>>>> is less overhead in memory to prefix keys than manage multiple Cfs
>>>>EXCEPT
>>>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>>>overhead
>>>> in reading a whole slew of rows you don't care about as you can't
>>>> map/reduce a single virtual CF but must map/reduce the whole CF
>>>>wasting
>>>> TONS of resources.
>>>
>>>That's a good point that I hadn't considered beforehand, especially as
>>>I'd like to run MR jobs against these CFs.
>>>
>>>Is this limitation inherent in the way that Cassandra is modelled as
>>>input for Hadoop or could you write a custom slice query to only feed
>>>in one particular prefix into Hadoop?
>>>
>>>Cheers,
>>>
>>>Ben
>>
>>
>



Mime
View raw message