incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yatong Zhang <>
Subject Re: Really need some advices on large data considerations
Date Thu, 15 May 2014 09:55:34 GMT
Hi Michael, thanks for the reply,

I would RAID0 all those data drives, personally, and give up managing them
> separately. They are on multiple PCIe controllers, one drive per channel,
> right?

Raid 0 is a simple way to go but one disk failure can cause the whole
volume down, so I am afraid raid 0 won't be our choice.

I would highly suggest re-thinking about how you want to set up your data
> model and re-plan your cluster appropriately,

Our data is large but our model is simple and most of the operation is
reading by key, and we never update the data (only delete periodically).
Due to its 'dynamo' arch serving so much 'static' data on cassandra is not
a problem. What I am concerning is the 'dynamic' part, compactions, adding
/ removing nodes, data re-blancing or some thing like that.

One thing we most care is scalability and fail-over strategy and looks like
Cassandra is splendid for this: linear scalability, decentralized,
auto-partition, auto-recovery. So we choose it.

> but if you are using large blobs like image data, think about putting that
> blob data somewhere else

Any good ideas about this?

The doc you mentioned on the datastax site is great. we're still gathering
information and evaluating cassandra, and it'll be great if you have any
other suggestions!



View raw message