cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Durity, Sean R" <SEAN_R_DUR...@homedepot.com>
Subject RE: [EXTERNAL] Re: Using Cassandra as an object store
Date Fri, 19 Apr 2019 13:15:16 GMT
Object stores are some of our largest and oldest use cases. Cassandra has been a good choice
for us. We do chunk the objects into 64k chunks (I think), so that partitions are not too
large and it scales predictably. For us, the choice was more about high availability and scalability,
which Cassandra provides well.

Sean Durity




From: Paul Chandler <paul@redshots.com>
Sent: Friday, April 19, 2019 5:24 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Using Cassandra as an object store

Gene,

I have found that clusters used as object stores have caused me more problems than normal
in the past, so I recommend using a separate object store if possible.

However, it certainly can be done, there is just a few things to consider:

1) Deletion policy: How are these objects going to be deleted, we have had problems in the
past where deleted objects didn’t get removed from disk. This was because by the time they
were deleted they had been compacted into very large sstables that were rarely compacted again.
So think about compaction strategy and any tombstone issues you may come across.

2) Compression: Are the objects already compressed before they are stored eg jpgs ? If so
turn compression off on the table, this reduces the amount of data read into memory when reading
the data, reducing pressure on the heap. We did some trials with one system, and found much
better performance if the compression was performed on the client side. So try some tests
with that.

3) How often is the data read? There will be be completely different hardware requirements
depending on whether this is a image store for an e-commerce site, compared with a pdf store
holding client invoices. With a small amount of reads per object, then you can specify smaller
CPUs and memory machines with a large amount of storage. If there are a large amount of reads,
them you need to think much more carefully about memory and CPU, as per the Walmart article
you referenced.

Thanks

Paul Chandler
www.redshots.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redshots.com&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=qymTcRJstEMuDEFFmnzgGLitW-sPExPRTKslnzg56nI&e=>




On 19 Apr 2019, at 09:04, DuyHai Doan <doanduyhai@gmail.com<mailto:doanduyhai@gmail.com>>
wrote:

Idea:

To guarantee data integrity, you can store an MD5 of all chunks data as static column in the
partition that contains the chunks

On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 <cclive1601@gmail.com<mailto:cclive1601@gmail.com>>
wrote:
we have use cassandra as object store for some years, you can just split the object into some
small pieces. object got a pk, then the some small pieces got some pks ,object's pk and pieces's
pk can be store in meta table in cassandra, and small pieces's pk and some pieces store in
data table.  we store videos ,picture and other no structure data.

Gene <gh5046@gmail.com<mailto:gh5046@gmail.com>> 于2019年4月19日周五 下午1:25写道:
Howdy

I'm looking at the possibility of using cassandra as an object store to offload image/blob
data from an Oracle database.  I've seen mentions of it being used as an object store in a
large scale fashion, like with Walmart:

https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593<https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_walmartlabs_building-2Dobject-2Dstore-2Dstoring-2Dimages-2Din-2Dcassandra-2Dwalmart-2Dscale-2Da6b9c02af593&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=Ea7HkmBSM32WG3930PP3mqmx7FmjQyJnNjNKULshL4U&e=>

However I have found little on small scale setups and if it's even worth using Cassandra in
place of something else that's meant to be used for object storage, like Ceph.

Additionally, I've read that cassandra struggles with storing objects 10MB or larger and it's
recommended to break objects up into smaller chunks, which either requires some kind of middleware
between our application and cassandra, or it would require our application to split objects
into smaller chunks and recombine them as needed.

I've looked into pithos and astyanax, but those are both no longer developed and I'm not seeing
anything that might replace them in the long term.

https://github.com/exoscale/pithos<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_exoscale_pithos&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=VXuCOqIAr5OnfYjD386q__7GaDFCeXxP2uVtDBWf4q0&e=>
https://github.com/Netflix/astyanax<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Netflix_astyanax&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=2XnWJZ_TELTnIh3QtGe5SMJbuLNmTeKSC_cHooe3jYw&s=uLgsw32DlBnzdGCqCbWn2VMQ5YCtzTs6YpiozT79fpM&e=>

Any helpful information or advice would be greatly appreciated.

Thanks in advance.

-Gene


--
you are the apple of my eye !


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.
Mime
View raw message