Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure.

We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total.

Generally the problem is that much data usually means that cassandra becomes io bound during repairs and compactions even if your hot dataset would fit in the page cache. There are efforts to overcome this and 0.7 will help with repair problems but for the time being you have to have quite some headroom in terms of io performance to handle these situations.  

Here is a related post:

On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote:

Would anyone recommend using Cassandra for storing hundreds of thousands of documents in Word/PDF format? The manual says it can store documents under 64MB with no issue but was wondering if anyone is using it for this specific perpose.  Would it be efficient/reliable and is there anything I need to bear in mind?
Thanks in advance

<Signature.jpg>     Brendan Poole
     Systems Developer
NewLaw Solicitors
     Helmont House
     Churchill Way
     029 2078 4283


P Please consider the environment before printing this e-mail
Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law.

The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited.

NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However, it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use.

NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038.
NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is

The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email:

We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners.