cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: Using Cassandra to store binary files?
Date Tue, 16 Oct 2012 17:36:53 GMT
Yes, astyanax stores the file in many rows so it reads from many disks giving you a performance
advantage vs. storing each file in one row….well at least from my understanding so read
performance "should" be really really good in that case.


From: Michael Kjellman <<>>
Reply-To: "<>" <<>>
Date: Tuesday, October 16, 2012 10:07 AM
To: "<>" <<>>
Subject: Re: Using Cassandra to store binary files?

When we started with Cassandra almost 2 years ago in production originally it was for the
sole purpose storing blobs in a redundant way. I ignored the warnings as my own tests showed
it would be okay (and two years later it is "ok"). If you plan on using Cassandra later (as
we now as as features such as secondary indexes and cql have matured I'm now stuck with a
large amount of data in Cassandra that maybe could be in a better place.) Does it work? Yes.
Would I do it again? Not 100% sure. Compactions of these column families take forever.

Also, by default there is a 16MB limit. Yes, this is adjustable but currently Thrift does
not stream data. I didn't know that Netflix had worked around this (referring to Dean's reply)
— I'll have to look through the source to see how they are overcoming the limitations of
the protocol. Last I read there were no plans to make Thrift stream. Looks like there is a
bug at

You might want to take a look at the following page:

I wanted an easy key value store when I originally picked Cassandra. As our project needs
changed and Cassandra has now begun playing a more critical role as it has matured (since
the 0.7 days), in retrospect HDFS might have been a better option long term as I really will
never need indexing etc on my binary blobs and the convenience of simply being able to grab/reassemble
a file by grabbing it's key was convenient at the time but maybe not the most forward thinking.
Hope that helps a bit.

Also, your read performance won't be amazing by any means with blobs. Not sure if your priority
is reads or writes. In our case it was writes so it wasn't a large loss.


From: Vasileios Vlachos <<>>
Reply-To: "<>" <<>>
Date: Tuesday, October 16, 2012 8:49 AM
To: "<>" <<>>
Subject: Using Cassandra to store binary files?

Hello All,

We need to store about 40G of binary files in a redundant way and since we are already using
Cassandra for other applications we were thinking that we could just solve that problem using
the same Cassandra cluster. Each individual File will be approximately 1MB.

We are thinking that the data structure should be very simple for this case, using one CF
with just one column which will contain the actual files. The row key should then uniquely
identify each file. Speed is not an issue when we retrieving the files. Impacting other applications
using Cassandra is more important for us. In order to prevent performance issues with other
applications using our Cassandra cluster at the moment, we think we should disable key_cache
and row_cache for this column family.

Anyone tried this before or anyone thinks this is going to be a bad idea? Do you think our
current plan is sensible? Any input would be much appreciated. Thank you in advance.



'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks

View raw message