cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: Using Cassandra to store binary files?
Date Tue, 16 Oct 2012 18:34:45 GMT
I am not sure.  If I were to implement it myself though, I would have

postfixed the rows with 1,2,3,4,…<lastValue> and then stored the lastValue
in the first row so then my program knows all the rows.

Ie. Not sure an index is really needed in that case.


On 10/16/12 11:45 AM, "Michael Kjellman" <> wrote:

>Ah, so they just wrote chunking into Astyanax? Do they create an index
>somewhere so they know how to reassemble the file on the way out?
>On 10/16/12 10:36 AM, "Hiller, Dean" <> wrote:
>>Yes, astyanax stores the file in many rows so it reads from many disks
>>giving you a performance advantage vs. storing each file in one row….well
>>at least from my understanding so read performance "should" be really
>>really good in that case.
>>From: Michael Kjellman
>>Reply-To: "<>"
>>Date: Tuesday, October 16, 2012 10:07 AM
>>To: "<>"
>>Subject: Re: Using Cassandra to store binary files?
>>When we started with Cassandra almost 2 years ago in production
>>originally it was for the sole purpose storing blobs in a redundant way.
>>I ignored the warnings as my own tests showed it would be okay (and two
>>years later it is "ok"). If you plan on using Cassandra later (as we now
>>as as features such as secondary indexes and cql have matured I'm now
>>stuck with a large amount of data in Cassandra that maybe could be in a
>>better place.) Does it work? Yes. Would I do it again? Not 100% sure.
>>Compactions of these column families take forever.
>>Also, by default there is a 16MB limit. Yes, this is adjustable but
>>currently Thrift does not stream data. I didn't know that Netflix had
>>worked around this (referring to Dean's reply) ― I'll have to look
>>through the source to see how they are overcoming the limitations of the
>>protocol. Last I read there were no plans to make Thrift stream. Looks
>>like there is a bug at
>>You might want to take a look at the following page:
>>I wanted an easy key value store when I originally picked Cassandra. As
>>our project needs changed and Cassandra has now begun playing a more
>>critical role as it has matured (since the 0.7 days), in retrospect HDFS
>>might have been a better option long term as I really will never need
>>indexing etc on my binary blobs and the convenience of simply being able
>>to grab/reassemble a file by grabbing it's key was convenient at the time
>>but maybe not the most forward thinking. Hope that helps a bit.
>>Also, your read performance won't be amazing by any means with blobs. Not
>>sure if your priority is reads or writes. In our case it was writes so it
>>wasn't a large loss.
>>From: Vasileios Vlachos
>>Reply-To: "<>"
>>Date: Tuesday, October 16, 2012 8:49 AM
>>To: "<>"
>>Subject: Using Cassandra to store binary files?
>>Hello All,
>>We need to store about 40G of binary files in a redundant way and since
>>we are already using Cassandra for other applications we were thinking
>>that we could just solve that problem using the same Cassandra cluster.
>>Each individual File will be approximately 1MB.
>>We are thinking that the data structure should be very simple for this
>>case, using one CF with just one column which will contain the actual
>>files. The row key should then uniquely identify each file. Speed is not
>>an issue when we retrieving the files. Impacting other applications using
>>Cassandra is more important for us. In order to prevent performance
>>issues with other applications using our Cassandra cluster at the moment,
>>we think we should disable key_cache and row_cache for this column
>>Anyone tried this before or anyone thinks this is going to be a bad idea?
>>Do you think our current plan is sensible? Any input would be much
>>appreciated. Thank you in advance.
>>'Like' us on Facebook for exclusive content and other resources on all
>>Barracuda Networks solutions.
>>  ­­
>'Like' us on Facebook for exclusive content and other resources on all
>Barracuda Networks solutions.

View raw message