Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DFB5FC9F0 for ; Tue, 18 Jun 2013 12:58:17 +0000 (UTC) Received: (qmail 23743 invoked by uid 500); 18 Jun 2013 12:58:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23728 invoked by uid 500); 18 Jun 2013 12:58:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23720 invoked by uid 99); 18 Jun 2013 12:58:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jun 2013 12:58:14 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [74.125.149.67] (HELO na3sys009aog101.obsmtp.com) (74.125.149.67) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 18 Jun 2013 12:58:07 +0000 Received: from mail-vb0-f47.google.com ([209.85.212.47]) (using TLSv1) by na3sys009aob101.postini.com ([74.125.148.12]) with SMTP ID DSNKUcBZNFR4MNC7uO0krHKRhforG96RO3+a@postini.com; Tue, 18 Jun 2013 05:57:46 PDT Received: by mail-vb0-f47.google.com with SMTP id x14so2781797vbb.6 for ; Tue, 18 Jun 2013 05:57:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=AwBmDm9A6q+EHj75+PgOrvNweZ2egZOQMEHSkKx4HBA=; b=mt7anhr4ZizAyaflY+NR8q6UIBQFxXdYm+wq/bjBSVxrfSTPc3rdqaNv2SngOtUFKE CU+W8smAg+/Z+eRmDzUOlm721OpnDpZL5MUO1y8RwJeboyOuiBXeSNXTikwodS+9oXNx PxL6WyunIr6+xvVnNhNJnaZaZXLgHCYeRhr72Sa0mjRqyMDpLW2CM0ge8xCs1S48JhuJ M2zPZa7Mi47DksOwDnOVXRspTzqpokHi0wjcX/cdc1ZImTL0CiYrTO6Zg7Gv3fue0Xai wMCMnX5Dq+s0iYfBje1MKBEl4zuMOcojggFLzcqOPpfEc45wZ83Si1HzdTUBwByS9+vC m2SA== X-Received: by 10.52.22.174 with SMTP id e14mr1496222vdf.21.1371560244008; Tue, 18 Jun 2013 05:57:24 -0700 (PDT) X-Received: by 10.52.22.174 with SMTP id e14mr1496220vdf.21.1371560243906; Tue, 18 Jun 2013 05:57:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.181.9 with HTTP; Tue, 18 Jun 2013 05:57:03 -0700 (PDT) In-Reply-To: References: <1371446905.3657.GenericBBA@web160906.mail.bf1.yahoo.com> From: Franc Carter Date: Tue, 18 Jun 2013 22:57:03 +1000 Message-ID: Subject: Re: Large number of files for Leveled Compaction To: user@cassandra.apache.org, Wei Zhu Content-Type: multipart/alternative; boundary=20cf30780e16205ba204df6d41d0 X-Gm-Message-State: ALoCoQniK4zEKmFeN98MWuPsDcvPiOHojd1KmhKWQtM/GquErihqCd/hc3H8zaO1sY6aIVN/prz0w//bdjagmvjXJFKMrmuADUsJE1XR/fbzMRFOIgKLFejdUCDOdkgrzhEWBs8ayNWZe0aJHca8mp9NdVk4GRXvgB8bPsGcYc7z2fpm9AIX1hg= X-Virus-Checked: Checked by ClamAV on apache.org --20cf30780e16205ba204df6d41d0 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jun 17, 2013 at 3:37 PM, Franc Carter wrote: > On Mon, Jun 17, 2013 at 3:28 PM, Wei Zhu wrote: > >> default value of 5MB is way too small in practice. Too many files in one >> directory is not a good thing. It's not clear what should be a good number. >> I have heard people are using 50MB, 75MB, even 100MB. Do your own test o >> find a "right" number. >> > > Interesting - 50MB is the low end of what people are using - 5MB is a lot > lower. I'll try a 50MB set > Oops, forgot to ask - is there a way to get Cassandra to rebuild the sstables as bigger once I have updated the column family definition ? thanks > > cheers > > >> -Wei >> >> ------------------------------ >> *From: *"Franc Carter" >> *To: *user@cassandra.apache.org >> *Sent: *Sunday, June 16, 2013 10:15:22 PM >> *Subject: *Re: Large number of files for Leveled Compaction >> >> >> >> >> On Mon, Jun 17, 2013 at 2:59 PM, Manoj Mainali wrote: >> >>> Not in the case of LeveledCompaction. Only SizeTieredCompaction merges >>> smaller sstables into large ones. With the LeveledCompaction, the sstables >>> are always of fixed size but they are grouped into different levels. >>> >>> You can refer to this page >>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on >>> details of how LeveledCompaction works. >>> >>> >> Yes, but it seems I've misinterpreted that page ;-( >> >> I took this paragraph >> >> In figure 3, new sstables are added to the first level, L0, and >>> immediately compacted with the sstables in L1 (blue). When L1 fills up, >>> extra sstables are promoted to L2 (violet). Subsequent sstables generated >>> in L1 will be compacted with the sstables in L2 with which they overlap. As >>> more data is added, leveled compaction results in a situation like the one >>> shown in figure 4. >>> >> >> to mean that once a level fills up it gets compacted into a higher level >> >> cheers >> >> >>> Cheers >>> Manoj >>> >>> >>> On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter >> > wrote: >>> >>>> On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali wrote: >>>> >>>>> With LeveledCompaction, each sstable size is fixed and is defined by >>>>> sstable_size_in_mb in the compaction configuration of CF definition and >>>>> default value is 5MB. In you case, you may have not defined your own value, >>>>> that is why your each sstable is 5MB. And if you dataset is huge, you will >>>>> see a lot of sstable counts. >>>>> >>>> >>>> >>>> Ok, seems like I do have (at least) an incomplete understanding. I >>>> realise that the minimum size is 5MB, but I thought compaction would merge >>>> these into a smaller number of larger sstables ? >>>> >>>> thanks >>>> >>>> >>>>> Cheers >>>>> >>>>> Manoj >>>>> >>>>> >>>>> On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter < >>>>> franc.carter@sirca.org.au> wrote: >>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> We are trialling Cassandra-1.2(.4) with Leveled compaction as it >>>>>> looks like it may be a win for us. >>>>>> >>>>>> The first step of testing was to push a fairly large slab of data >>>>>> into the Column Family - we did this much faster (> x100) than we would in >>>>>> a production environment. This has left the Column Family with about >>>>>> 140,000 files in the Column Family directory which seems way too high. On >>>>>> two of the nodes the CompactionStats show 2 outstanding tasks and on a >>>>>> third node there are over 13,000 outstanding tasks. However from looking at >>>>>> the log activity it looks like compaction has finished on all nodes. >>>>>> >>>>>> Is this number of files expected/normal ? >>>>>> >>>>>> cheers >>>>>> >>>>>> -- >>>>>> >>>>>> *Franc Carter* | Systems architect | Sirca Ltd >>>>>> >>>>>> >>>>>> franc.carter@sirca.org.au | www.sirca.org.au >>>>>> >>>>>> Tel: +61 2 8355 2514 >>>>>> >>>>>> Level 4, 55 Harrington St, The Rocks NSW 2000 >>>>>> >>>>>> PO Box H58, Australia Square, Sydney NSW 1215 >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *Franc Carter* | Systems architect | Sirca Ltd >>>> >>>> >>>> franc.carter@sirca.org.au | www.sirca.org.au >>>> >>>> Tel: +61 2 8355 2514 >>>> >>>> Level 4, 55 Harrington St, The Rocks NSW 2000 >>>> >>>> PO Box H58, Australia Square, Sydney NSW 1215 >>>> >>>> >>>> >>> >> >> >> -- >> >> *Franc Carter* | Systems architect | Sirca Ltd >> >> >> franc.carter@sirca.org.au | www.sirca.org.au >> >> Tel: +61 2 8355 2514 >> >> Level 4, 55 Harrington St, The Rocks NSW 2000 >> >> PO Box H58, Australia Square, Sydney NSW 1215 >> >> >> >> > > > -- > > *Franc Carter* | Systems architect | Sirca Ltd > > > franc.carter@sirca.org.au | www.sirca.org.au > > Tel: +61 2 8355 2514 > > Level 4, 55 Harrington St, The Rocks NSW 2000 > > PO Box H58, Australia Square, Sydney NSW 1215 > > > -- *Franc Carter* | Systems architect | Sirca Ltd franc.carter@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 --20cf30780e16205ba204df6d41d0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Mon, Jun 17, 2013 at 3:37 PM, Franc Carter <franc.carter@sirca.= org.au> wrote:
On Mon, Jun 17, 2013 at 3:28 PM, Wei Zhu <wz1975@yahoo.co= m> wrote:
d= efault value of 5MB is way too small in practice. Too many files in one dir= ectory is not a good thing. It's not clear what should be a good number= . I have heard people are using 50MB, 75MB, even 100MB. Do your own test o = find a "right" number.

Interesting - 50MB is the low end o= f what people are using - 5MB is a lot lower. I'll try a 50MB set

Oops, forgot to ask - is there a= way to get Cassandra to rebuild the sstables as bigger once I have updated= the column family definition ?

thanks
=A0

cheers

<= br>-Wei


F= rom: "Franc Carter" <franc.carter@sirca.org.au>
To: u= ser@cassandra.apache.org
Sent: Sunday, June 16, 2013 10:15:22= PM
Subject: Re: Large number of files for Leveled Compaction




On Mon, Jun 17, 2013 at 2:5= 9 PM, Manoj Mainali <mainalimanoj@gmail.com> wrote:
Not in the case of LeveledCompaction. Only SizeTieredCompa= ction merges smaller sstables into large ones. With the LeveledCompaction, = the sstables are always of fixed size but they are grouped into different l= evels.

You can refer to this page=A0http= ://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra=A0o= n details of how LeveledCompaction works.


Yes, but it seems I've misin= terpreted that page ;-(

I took this paragraph

In figure 3, new sstables are added to the first level, L0, and=20 immediately compacted with the sstables in L1 (blue). When L1 fills up, extra sstables are promoted to L2 (violet). Subsequent sstables=20 generated in L1 will be compacted with the sstables in L2 with which=20 they overlap. As more data is added, leveled compaction results in a=20 situation like the one shown in figure 4.

to mean that = once a level fills up it gets compacted into a higher level

cheers=A0
Cheers
=
Manoj

On Mon, Jun 17, 2013 at 1:54 PM, Franc Car= ter <franc.carter@sirca.org.au> wrote:
On Mon, Jun 17, 2013= at 2:47 PM, Manoj Mainali <mainalimanoj@gmail.com> wro= te:
With LeveledCompaction, each sstable size is fixed and is = defined by sstable_size_in_mb in=A0the compaction configuration of CF defin= ition and default value is 5MB. In you case, you may have not defined your = own value, that is why your each sstable is 5MB. And if you dataset is huge= , you will see a lot of sstable counts.


Ok, seems like I do have (at least) a= n incomplete understanding. I realise that the minimum size is 5MB, but I t= hought compaction would merge these into a smaller number of larger sstable= s ?

thanks


Cheers

Manoj


On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter = <franc.carter@sirca.org.au> wrote:

Hi,
We are trialling Cassandra-1.2(.4) with Leveled compaction as it look= s like it may be a win for us.

The first step of testing was to push a fairly large slab of data into the = Column Family - we did this much faster (> x100) than we would in a prod= uction environment. This has left the Column Family with about 140,000 file= s in the Column Family directory which seems way too high. On two of the no= des the CompactionStats show 2 outstanding tasks and on a third node there = are over 13,000 outstanding tasks. However from looking at the log activity= it looks like compaction has finished on all nodes.

Is this number of files expected/normal ?

cheers

--

Franc Carter<= /b> |<= /span> Systems architect | Sirca Ltd

franc.carter@sirca.org.au=A0|=A0www.sirca.org.au

Tel:= =A0+61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215<= /span>






--

Franc Carter<= /b> |<= /span> Systems architect | Sirca Ltd

franc.carter@sirca.org.au=A0|=A0www.sirca.org.au

Tel:= =A0+61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215<= /span>






--

Franc Carter<= /b> |<= /span> Systems architect | Sirca Ltd

franc.carter@sirca.org.au=A0|=A0www.sirca.org.au

Tel:= =A0+61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215<= /span>






--

Franc Carter<= /b> |<= /span> Systems architect | Sirca Ltd

franc.carter@sirca.org.au=A0|=A0www.sirca.org.au

Tel:= =A0+61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215<= /span>





--

Franc Carter<= /b> |<= /span> Systems architect | Sirca Ltd

franc.carter@sirca.org.au=A0|=A0www.sirca.org.au

Tel:= =A0+61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215<= /span>


--20cf30780e16205ba204df6d41d0--