incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasileios Vlachos <vasileiosvlac...@gmail.com>
Subject Re: Using Cassandra to store binary files?
Date Fri, 19 Oct 2012 21:27:06 GMT
Hello,

Thank you all for your responses.

Performance is not an issue at all as I described, so it shouldn't be
problematic. At least this is our current understanding. We will try it and
post back if something interesting comes up. Many thanks.

Regards,

Vasilis



On Tue, Oct 16, 2012 at 7:34 PM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:

> I am not sure.  If I were to implement it myself though, I would have
> probably...
>
> postfixed the rows with 1,2,3,4,...<lastValue> and then stored the lastValue
> in the first row so then my program knows all the rows.
>
> Ie. Not sure an index is really needed in that case.
>
> Dean
>
> On 10/16/12 11:45 AM, "Michael Kjellman" <mkjellman@barracuda.com> wrote:
>
> >Ah, so they just wrote chunking into Astyanax? Do they create an index
> >somewhere so they know how to reassemble the file on the way out?
> >
> >On 10/16/12 10:36 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:
> >
> >>Yes, astyanax stores the file in many rows so it reads from many disks
> >>giving you a performance advantage vs. storing each file in one row....well
> >>at least from my understanding so read performance "should" be really
> >>really good in that case.
> >>
> >>Dean
> >>
> >>From: Michael Kjellman
> >><mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
> >>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
> >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> >>Date: Tuesday, October 16, 2012 10:07 AM
> >>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
> >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> >>Subject: Re: Using Cassandra to store binary files?
> >>
> >>When we started with Cassandra almost 2 years ago in production
> >>originally it was for the sole purpose storing blobs in a redundant way.
> >>I ignored the warnings as my own tests showed it would be okay (and two
> >>years later it is "ok"). If you plan on using Cassandra later (as we now
> >>as as features such as secondary indexes and cql have matured I'm now
> >>stuck with a large amount of data in Cassandra that maybe could be in a
> >>better place.) Does it work? Yes. Would I do it again? Not 100% sure.
> >>Compactions of these column families take forever.
> >>
> >>Also, by default there is a 16MB limit. Yes, this is adjustable but
> >>currently Thrift does not stream data. I didn't know that Netflix had
> >>worked around this (referring to Dean's reply) -- I'll have to look
> >>through the source to see how they are overcoming the limitations of the
> >>protocol. Last I read there were no plans to make Thrift stream. Looks
> >>like there is a bug at
> >>https://issues.apache.org/jira/browse/CASSANDRA-265
> >>
> >>You might want to take a look at the following page:
> >>http://wiki.apache.org/cassandra/CassandraLimitations
> >>
> >>I wanted an easy key value store when I originally picked Cassandra. As
> >>our project needs changed and Cassandra has now begun playing a more
> >>critical role as it has matured (since the 0.7 days), in retrospect HDFS
> >>might have been a better option long term as I really will never need
> >>indexing etc on my binary blobs and the convenience of simply being able
> >>to grab/reassemble a file by grabbing it's key was convenient at the time
> >>but maybe not the most forward thinking. Hope that helps a bit.
> >>
> >>Also, your read performance won't be amazing by any means with blobs. Not
> >>sure if your priority is reads or writes. In our case it was writes so it
> >>wasn't a large loss.
> >>
> >>Best,
> >>michael
> >>
> >>
> >>From: Vasileios Vlachos
> >><vasileiosvlachos@gmail.com<mailto:vasileiosvlachos@gmail.com>>
> >>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
> >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> >>Date: Tuesday, October 16, 2012 8:49 AM
> >>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
> >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> >>Subject: Using Cassandra to store binary files?
> >>
> >>Hello All,
> >>
> >>We need to store about 40G of binary files in a redundant way and since
> >>we are already using Cassandra for other applications we were thinking
> >>that we could just solve that problem using the same Cassandra cluster.
> >>Each individual File will be approximately 1MB.
> >>
> >>We are thinking that the data structure should be very simple for this
> >>case, using one CF with just one column which will contain the actual
> >>files. The row key should then uniquely identify each file. Speed is not
> >>an issue when we retrieving the files. Impacting other applications using
> >>Cassandra is more important for us. In order to prevent performance
> >>issues with other applications using our Cassandra cluster at the moment,
> >>we think we should disable key_cache and row_cache for this column
> >>family.
> >>
> >>Anyone tried this before or anyone thinks this is going to be a bad idea?
> >>Do you think our current plan is sensible? Any input would be much
> >>appreciated. Thank you in advance.
> >>
> >>Regards,
> >>
> >>Vasilis
> >>
> >>----------------------------------
> >>'Like' us on Facebook for exclusive content and other resources on all
> >>Barracuda Networks solutions.
> >>Visit http://barracudanetworks.com/facebook
> >>  --
> >
> >
> >'Like' us on Facebook for exclusive content and other resources on all
> >Barracuda Networks solutions.
> >Visit http://barracudanetworks.com/facebook
> >
> >
>
>

Mime
View raw message