Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 682C8FE4C for ; Tue, 9 Jul 2013 07:19:08 +0000 (UTC) Received: (qmail 57670 invoked by uid 500); 9 Jul 2013 07:19:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57202 invoked by uid 500); 9 Jul 2013 07:19:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 57194 invoked by uid 99); 9 Jul 2013 07:19:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2013 07:19:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.220.171] (HELO mail-vc0-f171.google.com) (209.85.220.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2013 07:18:57 +0000 Received: by mail-vc0-f171.google.com with SMTP id gd11so4091503vcb.16 for ; Tue, 09 Jul 2013 00:18:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=K93/D7JhzoMMuRWk3ZePBo9TOBDdWsyXOn8iDu48enQ=; b=ljGs92uk5OV7sQu4iQAGLB8bGCYNSzXLjwPxlPoSQPK154VzNnGZoOYOXQjHBmi6P5 mQtAHf1FDUagZWAn0fa+afrDrZPk7Mbk22EbujPdPUvoW2WNtfDbDX2PrsYFJN8A3LxN 3qHASkdHYNaTfeU3ezkkde1/PPp9i/Kf2zXHLa41jh4DQWah9DQYMA4N4EH0/z4bzlHq E1OEwNdewr4nW07lTVtm09aGxL5BvqrHKNtgNy1S3nnGfK7ip3JV/4y/vdx2gwAy4Lh7 A+HTI+NiXBqSH2S7DyScPOvmo50wQKIQy/A/JO53E75NrLfwJNXjmUfuFglRgvAtk6Kl 4a4g== MIME-Version: 1.0 X-Received: by 10.220.50.66 with SMTP id y2mr15489667vcf.84.1373354296186; Tue, 09 Jul 2013 00:18:16 -0700 (PDT) Received: by 10.59.2.165 with HTTP; Tue, 9 Jul 2013 00:18:16 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Jul 2013 09:18:16 +0200 Message-ID: Subject: Re: does anyone store large values in cassandra e.g. 100kb? From: Theo Hultberg To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b342e18f9ad0504e10ef69a X-Gm-Message-State: ALoCoQnErJ8VNPBQrP7+Fxv4PU8O/O5dInqQnyenyc+iRKfEb0nNrMoPSD3LcfrpLr52TByywFpv X-Virus-Checked: Checked by ClamAV on apache.org --047d7b342e18f9ad0504e10ef69a Content-Type: text/plain; charset=ISO-8859-1 We store objects that are a couple of tens of K, sometimes 100K, and we store quite a few of these per row, sometimes hundreds of thousands. One problem we encountered early was that these rows would become so big that C* couldn't compact the rows in-memory and had to revert to slow two-pass compactions where it spills partially compacted rows to disk. we solved that in two ways, first by increasing in_memory_compaction_limit_in_mb from 64 to 128, and although it helped a little bit we quickly realized didn't have much effect because most of the time was taken up by really huge rows many times larger than that. We ended up implementing a simple sharding scheme where each row is actually 36 rows that each contain 1/36 of the range (we take the first letter in the column key and stick that on the row key on writes, and on reads we read all 36 rows -- 36 because there are 36 letters and numbers in the ascii alphabet and our column keys happen to distribute over that quite nicely). Cassandra works well with semi-large objects, and it works well with wide rows, but you have to be careful about the combination where rows get larger than 64 Mb. T# On Mon, Jul 8, 2013 at 8:13 PM, S Ahmed wrote: > Hi Peter, > > Can you describe your environment, # of documents and what kind of usage > pattern you have? > > > > > On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin wrote: > >> I regularly store word and pdf docs in cassandra without any issues. >> >> >> >> >> On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed wrote: >> >>> I'm guessing that most people use cassandra to store relatively smaller >>> payloads like 1-5kb in size. >>> >>> Is there anyone using it to store say 100kb (1/10 of a megabyte) and if >>> so, was there any tweaking or gotchas that you ran into? >>> >> >> > --047d7b342e18f9ad0504e10ef69a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
We store objects that are a couple of tens of K, sometimes= 100K, and we store quite a few of these per row, sometimes hundreds of tho= usands.

One problem we encountered early was that these = rows would become so big that C* couldn't compact the rows in-memory an= d had to revert to slow two-pass compactions where it spills partially comp= acted rows to disk. we solved that in two ways, first by increasing=A0in_me= mory_compaction_limit_in_mb from 64 to 128, and although it helped a little= bit we quickly realized didn't have much effect because most of the ti= me was taken up by really huge rows many times larger than that.

We ended up implementing a simple sharding scheme where= each row is actually 36 rows that each contain 1/36 of the range (we take = the first letter in the column key and stick that on the row key on writes,= and on reads we read all 36 rows -- 36 because there are 36 letters and nu= mbers in the ascii alphabet and our column keys happen to distribute over t= hat quite nicely).

Cassandra works well with semi-large objects, and it wo= rks well with wide rows, but you have to be careful about the combination w= here rows get larger than 64 Mb.

T#


On Mon, Jul 8= , 2013 at 8:13 PM, S Ahmed <sahmed1020@gmail.com> wrote:<= br>
Hi Peter,

Can you describe your environ= ment, # of documents and what kind of usage pattern you have?



On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin <= woolfel@gmail.com> wrote:
I regularly store word and pdf docs in cassandra with= out any issues.




On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed= <sahmed1020@gmail.com> wrote:
I'm guessing that most = people use cassandra to store relatively smaller payloads like 1-5kb in siz= e.

Is there anyone using it to store say 100kb (1/10 of a megab= yte) and if so, was there any tweaking or gotchas that you ran into?



--047d7b342e18f9ad0504e10ef69a--