incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <jeremiah.jor...@morningstar.com>
Subject Re: Very large rows VS small rows
Date Thu, 29 Sep 2011 19:38:25 GMT
So I need to read what I write before hitting send.  Should have been, 
"If A works for YOUR use case." and "Wide rows DON'T spread across nodes 
well"

On 09/29/2011 02:34 PM, Jeremiah Jordan wrote:
> If A works for our use case, it is a much better option.  A given row 
> has to be read in full to return data from it, there used to be 
> limitations that a row had to fit in memory, but there is now code to 
> page through the data, so while that isn't a limitation any more, it 
> means rows that don't fit in memory are very slow to use.  Also wide 
> rows spread across nodes.  You should also consider more nodes in your 
> cluster.  From our experience node perform better when they are only 
> managing a few Hundred GB each.  Pretty sure that 10TB+ of data (100's 
> * 100GB) will not perform very well on a 3 node cluster, especially if 
> you plan to have RF=3, making it 10TB+ per node.
>
> -Jeremiah
>
> On 09/29/2011 12:20 PM, M Vieira wrote:
>> What would be the best approach
>> A) millions of ~2Kb rows, where each row could have ~6 columns
>> B) hundreds of ~100Gb rows, where each row could have ~1million columns
>>
>> Considerarions:
>> Most entries will be searched for (read+write) at least once a day 
>> but no more than 3 times a day.
>> Cheap hardware accross the cluster of 3 nodes each with 16Gb mem 
>> (heap = 8Gb)
>>
>> Any input would be appreciated
>> M.

Mime
View raw message