incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <jeremiah.jor...@morningstar.com>
Subject Re: Very large rows VS small rows
Date Thu, 29 Sep 2011 19:34:13 GMT
If A works for our use case, it is a much better option.  A given row 
has to be read in full to return data from it, there used to be 
limitations that a row had to fit in memory, but there is now code to 
page through the data, so while that isn't a limitation any more, it 
means rows that don't fit in memory are very slow to use.  Also wide 
rows spread across nodes.  You should also consider more nodes in your 
cluster.  From our experience node perform better when they are only 
managing a few Hundred GB each.  Pretty sure that 10TB+ of data (100's * 
100GB) will not perform very well on a 3 node cluster, especially if you 
plan to have RF=3, making it 10TB+ per node.

-Jeremiah

On 09/29/2011 12:20 PM, M Vieira wrote:
> What would be the best approach
> A) millions of ~2Kb rows, where each row could have ~6 columns
> B) hundreds of ~100Gb rows, where each row could have ~1million columns
>
> Considerarions:
> Most entries will be searched for (read+write) at least once a day but 
> no more than 3 times a day.
> Cheap hardware accross the cluster of 3 nodes each with 16Gb mem (heap 
> = 8Gb)
>
> Any input would be appreciated
> M.

Mime
View raw message