From user-return-63709-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Apr 19 09:23:48 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 34619180627 for ; Fri, 19 Apr 2019 11:23:48 +0200 (CEST) Received: (qmail 25844 invoked by uid 500); 19 Apr 2019 09:23:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25834 invoked by uid 99); 19 Apr 2019 09:23:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Apr 2019 09:23:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BF74F1826A9 for ; Fri, 19 Apr 2019 09:23:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.5 X-Spam-Level: * X-Spam-Status: No, score=1.5 tagged_above=-999 required=6.31 tests=[DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=neutral reason="invalid (public key: not available)" header.d=redshots.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id f_hqBFqwc7k3 for ; Fri, 19 Apr 2019 09:23:39 +0000 (UTC) Received: from hermes.krystal.co.uk (hermes.krystal.co.uk [77.72.1.66]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 3A95160DD0 for ; Fri, 19 Apr 2019 09:23:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=redshots.com; s=default; h=Message-Id:In-Reply-To:To:References:Date: Subject:Mime-Version:Content-Type:From:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=kaPBZWwzxpnT/DvfFKUUv9R5df7sitXPgJg/31kjNRw=; b=lkBWnQk3YrgVOqHb6wdoy7143 UOVs45HxQvbM2xOTw3DFCZfGx/gaY3Tp09E44HwasWreSpNHpp7EOrdayzqa1+mYKMxDzCxLXCuyx tjIXQGwDyDGatkNT8aIhZ3RsCsMVtIGlC4gnenTBgpvwFx6WgPKfusyZsntsXW0FNCFl5XabWj0Kw fcQpT8Wa9qdU2hHaJCk+ieAFSDtT6mpQBWzNw5aXHjc/JUC6VNQUJ2IqWTbtGt2Ct3wQMKUmBLQh6 nPMNqOxu9xyAmrf/fl1PRtCPER0HflSTuiL6ZfnkfH0zuoJhkidrmIESvbCJtQ3kWU2RKFUaz9rSX 63loBsWTA==; Received: from [88.97.79.59] (port=54476 helo=[192.168.1.180]) by hermes.krystal.co.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1hHPk4-001SZj-4c for user@cassandra.apache.org; Fri, 19 Apr 2019 10:23:32 +0100 From: Paul Chandler Content-Type: multipart/alternative; boundary="Apple-Mail=_4528C039-D0C2-408B-BE74-7C650FE57374" Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: Using Cassandra as an object store Date: Fri, 19 Apr 2019 10:23:31 +0100 References: To: user@cassandra.apache.org In-Reply-To: Message-Id: <363F8983-711D-4690-B8EA-1B3DE54F6BA9@redshots.com> X-Mailer: Apple Mail (2.3445.9.1) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hermes.krystal.co.uk X-AntiAbuse: Original Domain - cassandra.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - redshots.com X-Get-Message-Sender-Via: hermes.krystal.co.uk: authenticated_id: paul@redshots.com X-Authenticated-Sender: hermes.krystal.co.uk: paul@redshots.com --Apple-Mail=_4528C039-D0C2-408B-BE74-7C650FE57374 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Gene, I have found that clusters used as object stores have caused me more = problems than normal in the past, so I recommend using a separate object = store if possible. However, it certainly can be done, there is just a few things to = consider: 1) Deletion policy: How are these objects going to be deleted, we have = had problems in the past where deleted objects didn=E2=80=99t get = removed from disk. This was because by the time they were deleted they = had been compacted into very large sstables that were rarely compacted = again. So think about compaction strategy and any tombstone issues you = may come across. 2) Compression: Are the objects already compressed before they are = stored eg jpgs ? If so turn compression off on the table, this reduces = the amount of data read into memory when reading the data, reducing = pressure on the heap. We did some trials with one system, and found much = better performance if the compression was performed on the client side. = So try some tests with that. 3) How often is the data read? There will be be completely different = hardware requirements depending on whether this is a image store for an = e-commerce site, compared with a pdf store holding client invoices. With = a small amount of reads per object, then you can specify smaller CPUs = and memory machines with a large amount of storage. If there are a large = amount of reads, them you need to think much more carefully about memory = and CPU, as per the Walmart article you referenced. Thanks=20 Paul Chandler www.redshots.com > On 19 Apr 2019, at 09:04, DuyHai Doan wrote: >=20 > Idea:=20 >=20 > To guarantee data integrity, you can store an MD5 of all chunks data = as static column in the partition that contains the chunks >=20 > On Fri, Apr 19, 2019 at 9:18 AM cclive1601=E4=BD=A0 = > wrote: > we have use cassandra as object store for some years, you can just = split the object into some small pieces. object got a pk, then the some = small pieces got some pks ,object's pk and pieces's pk can be store in = meta table in cassandra, and small pieces's pk and some pieces store in = data table. we store videos ,picture and other no structure data. >=20 > Gene > = =E4=BA=8E2019=E5=B9=B44=E6=9C=8819=E6=97=A5=E5=91=A8=E4=BA=94 = =E4=B8=8B=E5=8D=881:25=E5=86=99=E9=81=93=EF=BC=9A > Howdy >=20 > I'm looking at the possibility of using cassandra as an object store = to offload image/blob data from an Oracle database. I've seen mentions = of it being used as an object store in a large scale fashion, like with = Walmart: >=20 > = https://medium.com/walmartlabs/building-object-store-storing-images-in-cas= sandra-walmart-scale-a6b9c02af593 = >=20 > However I have found little on small scale setups and if it's even = worth using Cassandra in place of something else that's meant to be used = for object storage, like Ceph. >=20 > Additionally, I've read that cassandra struggles with storing objects = 10MB or larger and it's recommended to break objects up into smaller = chunks, which either requires some kind of middleware between our = application and cassandra, or it would require our application to split = objects into smaller chunks and recombine them as needed. >=20 > I've looked into pithos and astyanax, but those are both no longer = developed and I'm not seeing anything that might replace them in the = long term. >=20 > https://github.com/exoscale/pithos = > https://github.com/Netflix/astyanax = >=20 > Any helpful information or advice would be greatly appreciated. >=20 > Thanks in advance. >=20 > -Gene >=20 >=20 > --=20 > you are the apple of my eye ! --Apple-Mail=_4528C039-D0C2-408B-BE74-7C650FE57374 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Gene,

I have found that clusters used as object stores have caused = me more problems than normal in the past, so I recommend using a = separate object store if possible.

However, it certainly can be done, = there is just a few things to consider:

1) Deletion policy: How are these = objects going to be deleted, we have had problems in the past where = deleted objects didn=E2=80=99t get removed from disk. This was because = by the time they were deleted they had been compacted into very large = sstables that were rarely compacted again. So think about compaction = strategy and any tombstone issues you may come across.

2) Compression: Are the = objects already compressed before they are stored eg jpgs ? If so turn = compression off on the table, this reduces the amount of data read into = memory when reading the data, reducing pressure on the heap. We did some = trials with one system, and found much better performance if the = compression was performed on the client side. So try some tests with = that.

3) How = often is the data read? There will be be completely different hardware = requirements depending on whether this is a image store for an = e-commerce site, compared with a pdf store holding client invoices. With = a small amount of reads per object, then you can specify smaller CPUs = and memory machines with a large amount of storage. If there are a large = amount of reads, them you need to think much more carefully about memory = and CPU, as per the Walmart article you referenced.

Thanks 

Paul Chandler



On 19 Apr 2019, at 09:04, = DuyHai Doan <doanduyhai@gmail.com> wrote:

Idea: 

To guarantee data integrity, you can store an MD5 of all = chunks data as static column in the partition that contains the = chunks

On Fri, Apr 19, 2019 at 9:18 AM = cclive1601=E4=BD=A0 <cclive1601@gmail.com> wrote:
we have = use cassandra as object store for some years, you can just split the = object into some small pieces. object got a pk, then the some small = pieces got some pks ,object's pk and pieces's pk can be store in meta = table in cassandra, and small pieces's pk and some pieces store in data = table.  we store videos ,picture and other no structure = data.

Gene <gh5046@gmail.com> = =E4=BA=8E2019=E5=B9=B44=E6=9C=8819=E6=97=A5=E5=91=A8=E4=BA=94 = =E4=B8=8B=E5=8D=881:25=E5=86=99=E9=81=93=EF=BC=9A
Howdy

I'm looking at the possibility of using = cassandra as an object store to offload image/blob data from an Oracle = database.  I've seen mentions of it being used as an object store = in a large scale fashion, like with Walmart:


However I have found little on small = scale setups and if it's even worth using Cassandra in place of = something else that's meant to be used for object storage, like = Ceph.

Additionally, I've read that cassandra struggles with storing = objects 10MB or larger and it's recommended to break objects up into = smaller chunks, which either requires some kind of middleware between = our application and cassandra, or it would require our application to = split objects into smaller chunks and recombine them as = needed.

I've = looked into pithos and astyanax, but those are both no longer developed = and I'm not seeing anything that might replace them in the long = term.


Any helpful information = or advice would be greatly appreciated.

Thanks in advance.

-Gene


--
you are the apple = of my eye !

= --Apple-Mail=_4528C039-D0C2-408B-BE74-7C650FE57374--