cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chandler <p...@redshots.com>
Subject Re: Using Cassandra as an object store
Date Fri, 19 Apr 2019 09:23:31 GMT
Gene,

I have found that clusters used as object stores have caused me more problems than normal
in the past, so I recommend using a separate object store if possible.

However, it certainly can be done, there is just a few things to consider:

1) Deletion policy: How are these objects going to be deleted, we have had problems in the
past where deleted objects didn’t get removed from disk. This was because by the time they
were deleted they had been compacted into very large sstables that were rarely compacted again.
So think about compaction strategy and any tombstone issues you may come across.

2) Compression: Are the objects already compressed before they are stored eg jpgs ? If so
turn compression off on the table, this reduces the amount of data read into memory when reading
the data, reducing pressure on the heap. We did some trials with one system, and found much
better performance if the compression was performed on the client side. So try some tests
with that.

3) How often is the data read? There will be be completely different hardware requirements
depending on whether this is a image store for an e-commerce site, compared with a pdf store
holding client invoices. With a small amount of reads per object, then you can specify smaller
CPUs and memory machines with a large amount of storage. If there are a large amount of reads,
them you need to think much more carefully about memory and CPU, as per the Walmart article
you referenced.

Thanks 

Paul Chandler
www.redshots.com



> On 19 Apr 2019, at 09:04, DuyHai Doan <doanduyhai@gmail.com> wrote:
> 
> Idea: 
> 
> To guarantee data integrity, you can store an MD5 of all chunks data as static column
in the partition that contains the chunks
> 
> On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 <cclive1601@gmail.com <mailto:cclive1601@gmail.com>>
wrote:
> we have use cassandra as object store for some years, you can just split the object into
some small pieces. object got a pk, then the some small pieces got some pks ,object's pk and
pieces's pk can be store in meta table in cassandra, and small pieces's pk and some pieces
store in data table.  we store videos ,picture and other no structure data.
> 
> Gene <gh5046@gmail.com <mailto:gh5046@gmail.com>> 于2019年4月19日周五
下午1:25写道:
> Howdy
> 
> I'm looking at the possibility of using cassandra as an object store to offload image/blob
data from an Oracle database.  I've seen mentions of it being used as an object store in a
large scale fashion, like with Walmart:
> 
> https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593
<https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593>
> 
> However I have found little on small scale setups and if it's even worth using Cassandra
in place of something else that's meant to be used for object storage, like Ceph.
> 
> Additionally, I've read that cassandra struggles with storing objects 10MB or larger
and it's recommended to break objects up into smaller chunks, which either requires some kind
of middleware between our application and cassandra, or it would require our application to
split objects into smaller chunks and recombine them as needed.
> 
> I've looked into pithos and astyanax, but those are both no longer developed and I'm
not seeing anything that might replace them in the long term.
> 
> https://github.com/exoscale/pithos <https://github.com/exoscale/pithos>
> https://github.com/Netflix/astyanax <https://github.com/Netflix/astyanax>
> 
> Any helpful information or advice would be greatly appreciated.
> 
> Thanks in advance.
> 
> -Gene
> 
> 
> -- 
> you are the apple of my eye !


Mime
View raw message