incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Using Cassandra to store binary files?
Date Tue, 16 Oct 2012 18:34:45 GMT
I am not sure.  If I were to implement it myself though, I would have
probably…

postfixed the rows with 1,2,3,4,…<lastValue> and then stored the lastValue
in the first row so then my program knows all the rows.

Ie. Not sure an index is really needed in that case.

Dean

On 10/16/12 11:45 AM, "Michael Kjellman" <mkjellman@barracuda.com> wrote:

>Ah, so they just wrote chunking into Astyanax? Do they create an index
>somewhere so they know how to reassemble the file on the way out?
>
>On 10/16/12 10:36 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:
>
>>Yes, astyanax stores the file in many rows so it reads from many disks
>>giving you a performance advantage vs. storing each file in one row….well
>>at least from my understanding so read performance "should" be really
>>really good in that case.
>>
>>Dean
>>
>>From: Michael Kjellman
>><mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
>>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>>Date: Tuesday, October 16, 2012 10:07 AM
>>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>>Subject: Re: Using Cassandra to store binary files?
>>
>>When we started with Cassandra almost 2 years ago in production
>>originally it was for the sole purpose storing blobs in a redundant way.
>>I ignored the warnings as my own tests showed it would be okay (and two
>>years later it is "ok"). If you plan on using Cassandra later (as we now
>>as as features such as secondary indexes and cql have matured I'm now
>>stuck with a large amount of data in Cassandra that maybe could be in a
>>better place.) Does it work? Yes. Would I do it again? Not 100% sure.
>>Compactions of these column families take forever.
>>
>>Also, by default there is a 16MB limit. Yes, this is adjustable but
>>currently Thrift does not stream data. I didn't know that Netflix had
>>worked around this (referring to Dean's reply) ― I'll have to look
>>through the source to see how they are overcoming the limitations of the
>>protocol. Last I read there were no plans to make Thrift stream. Looks
>>like there is a bug at
>>https://issues.apache.org/jira/browse/CASSANDRA-265
>>
>>You might want to take a look at the following page:
>>http://wiki.apache.org/cassandra/CassandraLimitations
>>
>>I wanted an easy key value store when I originally picked Cassandra. As
>>our project needs changed and Cassandra has now begun playing a more
>>critical role as it has matured (since the 0.7 days), in retrospect HDFS
>>might have been a better option long term as I really will never need
>>indexing etc on my binary blobs and the convenience of simply being able
>>to grab/reassemble a file by grabbing it's key was convenient at the time
>>but maybe not the most forward thinking. Hope that helps a bit.
>>
>>Also, your read performance won't be amazing by any means with blobs. Not
>>sure if your priority is reads or writes. In our case it was writes so it
>>wasn't a large loss.
>>
>>Best,
>>michael
>>
>>
>>From: Vasileios Vlachos
>><vasileiosvlachos@gmail.com<mailto:vasileiosvlachos@gmail.com>>
>>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>>Date: Tuesday, October 16, 2012 8:49 AM
>>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>>Subject: Using Cassandra to store binary files?
>>
>>Hello All,
>>
>>We need to store about 40G of binary files in a redundant way and since
>>we are already using Cassandra for other applications we were thinking
>>that we could just solve that problem using the same Cassandra cluster.
>>Each individual File will be approximately 1MB.
>>
>>We are thinking that the data structure should be very simple for this
>>case, using one CF with just one column which will contain the actual
>>files. The row key should then uniquely identify each file. Speed is not
>>an issue when we retrieving the files. Impacting other applications using
>>Cassandra is more important for us. In order to prevent performance
>>issues with other applications using our Cassandra cluster at the moment,
>>we think we should disable key_cache and row_cache for this column
>>family.
>>
>>Anyone tried this before or anyone thinks this is going to be a bad idea?
>>Do you think our current plan is sensible? Any input would be much
>>appreciated. Thank you in advance.
>>
>>Regards,
>>
>>Vasilis
>>
>>----------------------------------
>>'Like' us on Facebook for exclusive content and other resources on all
>>Barracuda Networks solutions.
>>Visit http://barracudanetworks.com/facebook
>>  ­­
>
>
>'Like' us on Facebook for exclusive content and other resources on all
>Barracuda Networks solutions.
>Visit http://barracudanetworks.com/facebook
>
>

Mime
View raw message