incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kjellman <mkjell...@barracuda.com>
Subject Re: Using Cassandra to store binary files?
Date Tue, 16 Oct 2012 17:45:57 GMT
Ah, so they just wrote chunking into Astyanax? Do they create an index
somewhere so they know how to reassemble the file on the way out?

On 10/16/12 10:36 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>Yes, astyanax stores the file in many rows so it reads from many disks
>giving you a performance advantage vs. storing each file in one row….well
>at least from my understanding so read performance "should" be really
>really good in that case.
>
>Dean
>
>From: Michael Kjellman
><mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Tuesday, October 16, 2012 10:07 AM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Re: Using Cassandra to store binary files?
>
>When we started with Cassandra almost 2 years ago in production
>originally it was for the sole purpose storing blobs in a redundant way.
>I ignored the warnings as my own tests showed it would be okay (and two
>years later it is "ok"). If you plan on using Cassandra later (as we now
>as as features such as secondary indexes and cql have matured I'm now
>stuck with a large amount of data in Cassandra that maybe could be in a
>better place.) Does it work? Yes. Would I do it again? Not 100% sure.
>Compactions of these column families take forever.
>
>Also, by default there is a 16MB limit. Yes, this is adjustable but
>currently Thrift does not stream data. I didn't know that Netflix had
>worked around this (referring to Dean's reply) ― I'll have to look
>through the source to see how they are overcoming the limitations of the
>protocol. Last I read there were no plans to make Thrift stream. Looks
>like there is a bug at https://issues.apache.org/jira/browse/CASSANDRA-265
>
>You might want to take a look at the following page:
>http://wiki.apache.org/cassandra/CassandraLimitations
>
>I wanted an easy key value store when I originally picked Cassandra. As
>our project needs changed and Cassandra has now begun playing a more
>critical role as it has matured (since the 0.7 days), in retrospect HDFS
>might have been a better option long term as I really will never need
>indexing etc on my binary blobs and the convenience of simply being able
>to grab/reassemble a file by grabbing it's key was convenient at the time
>but maybe not the most forward thinking. Hope that helps a bit.
>
>Also, your read performance won't be amazing by any means with blobs. Not
>sure if your priority is reads or writes. In our case it was writes so it
>wasn't a large loss.
>
>Best,
>michael
>
>
>From: Vasileios Vlachos
><vasileiosvlachos@gmail.com<mailto:vasileiosvlachos@gmail.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Tuesday, October 16, 2012 8:49 AM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Using Cassandra to store binary files?
>
>Hello All,
>
>We need to store about 40G of binary files in a redundant way and since
>we are already using Cassandra for other applications we were thinking
>that we could just solve that problem using the same Cassandra cluster.
>Each individual File will be approximately 1MB.
>
>We are thinking that the data structure should be very simple for this
>case, using one CF with just one column which will contain the actual
>files. The row key should then uniquely identify each file. Speed is not
>an issue when we retrieving the files. Impacting other applications using
>Cassandra is more important for us. In order to prevent performance
>issues with other applications using our Cassandra cluster at the moment,
>we think we should disable key_cache and row_cache for this column family.
>
>Anyone tried this before or anyone thinks this is going to be a bad idea?
>Do you think our current plan is sensible? Any input would be much
>appreciated. Thank you in advance.
>
>Regards,
>
>Vasilis
>
>----------------------------------
>'Like' us on Facebook for exclusive content and other resources on all
>Barracuda Networks solutions.
>Visit http://barracudanetworks.com/facebook
>  ­­


'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks
solutions.
Visit http://barracudanetworks.com/facebook


Mime
View raw message