cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Lundin <...@eintr.org>
Subject Re: Images store in Cassandra
Date Mon, 14 Dec 2009 00:33:07 GMT
I'm currently evaluating using Cassandra for archival of immutable
"files" as well, fairly medium sized ones - within 1-100MB, with a
10MB mean size. Number of objects will probably be in the low
millions. We require neither high-throughput nor concurrent access of
these objects though, but rather our goals are availability and
persistence.

Steady and safe is key, so not being able to stream efficiently isn't
such a big deal for us, compared to the immediate benefits cassandra
offers in terms of managing replication, scaling and availability.

I don't have a real problem working within cassandra's limitations, myself.

Splitting blobs into manageable chunks / columns  - say 2-8 MB to fit
(almost) comfortably within thrift rpc semantics, works well-enough
for us. Also, splitting data over several column families helps
compaction, and I suppose it could be an optimization to split data
across multiple keys as well.

I kinda like the idea of having a single record per file though, since
it makes managing/deleting and referencing files easier.

To increase read performance and allow streaming/sendfile web serving
of "hot" files, an up-front cache like Varnish in front of a thin web
service to the "file system" would work well. Think of varnish as a
buffer, translating (slow) segmented reads to full-bore large-object
streaming, at the cost of initial latency and duplicated (frontend)
storage.
If you're building a web site, cache warmup to handle first-request
latency should probably be part of your plan as well...  etc, and so
on, turned over 'til done... :)

While this sort of usage certainly needs special care, and definitely
requires application-specific design, I haven't run into any blockers
(yet).

On the contrary, the forcing of focus unto identifying actual data
access patterns throughout a system is both enlightening and
rewarding, IMHO. Not everything is a nail, and that's ok. :)

/d


On Sun, Dec 13, 2009 at 10:55 PM, Michael Koziarski
<michael@koziarski.com> wrote:
> On Sun, Dec 13, 2009 at 9:05 AM, Ran Tavory <rantav@gmail.com> wrote:
>> As we're designing our systems for a move from mysql to Cassandra we're
>> considering moving our file storage to Cassandra as well. Is this wise?
>> We're currently using mogilefs to store media items (images) of average size
>> of 30Mb (400k images, and growing). Cassandra looks like a performance
>> improvement over mogilefs (saves roundtrip, no sql in the middle) but I was
>> wondering whether the fact that cassandra stores byte arrays should
>> encourage us to store images in it. Is Cassandra a good fit?
>
> I think that mogile would probably be a much better fit here.  While
> you may save a tiny bit of round-tripping, those sql queries aren't
> likely going be an appreciable percentage of the total time taken to
> stream the binary out to the user.
>
>> Has anyone had any similar experience or can send guidelines?
>> To phrase the question in more general terms: What's cassandra's sweet spot
>> in terms of Value size per column or total row size?
>> Thanks

Mime
View raw message