incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject Re: Large number of files for Leveled Compaction
Date Mon, 17 Jun 2013 05:28:25 GMT
default value of 5MB is way too small in practice. Too many files in one directory is not a
good thing. It's not clear what should be a good number. I have heard people are using 50MB,
75MB, even 100MB. Do your own test o find a "right" number. 

-Wei 

----- Original Message -----

From: "Franc Carter" <franc.carter@sirca.org.au> 
To: user@cassandra.apache.org 
Sent: Sunday, June 16, 2013 10:15:22 PM 
Subject: Re: Large number of files for Leveled Compaction 




On Mon, Jun 17, 2013 at 2:59 PM, Manoj Mainali < mainalimanoj@gmail.com > wrote: 



Not in the case of LeveledCompaction. Only SizeTieredCompaction merges smaller sstables into
large ones. With the LeveledCompaction, the sstables are always of fixed size but they are
grouped into different levels. 


You can refer to this page http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
on details of how LeveledCompaction works. 





Yes, but it seems I've misinterpreted that page ;-( 

I took this paragraph 


<blockquote>
In figure 3, new sstables are added to the first level, L0, and immediately compacted with
the sstables in L1 (blue). When L1 fills up, extra sstables are promoted to L2 (violet). Subsequent
sstables generated in L1 will be compacted with the sstables in L2 with which they overlap.
As more data is added, leveled compaction results in a situation like the one shown in figure
4. 

</blockquote>

to mean that once a level fills up it gets compacted into a higher level 

cheers 

<blockquote>



Cheers 
Manoj 





On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter < franc.carter@sirca.org.au > wrote: 

<blockquote>

On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali < mainalimanoj@gmail.com > wrote: 



<blockquote>

With LeveledCompaction, each sstable size is fixed and is defined by sstable_size_in_mb in
the compaction configuration of CF definition and default value is 5MB. In you case, you may
have not defined your own value, that is why your each sstable is 5MB. And if you dataset
is huge, you will see a lot of sstable counts. 

</blockquote>



Ok, seems like I do have (at least) an incomplete understanding. I realise that the minimum
size is 5MB, but I thought compaction would merge these into a smaller number of larger sstables
? 

thanks 



<blockquote>




Cheers 


Manoj 




On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter < franc.carter@sirca.org.au > wrote: 



<blockquote>

Hi, 

We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks like it may be a win
for us. 

The first step of testing was to push a fairly large slab of data into the Column Family -
we did this much faster (> x100) than we would in a production environment. This has left
the Column Family with about 140,000 files in the Column Family directory which seems way
too high. On two of the nodes the CompactionStats show 2 outstanding tasks and on a third
node there are over 13,000 outstanding tasks. However from looking at the log activity it
looks like compaction has finished on all nodes. 

Is this number of files expected/normal ? 

cheers 

-- 

Franc Carter | Systems architect | Sirca Ltd 

franc.carter@sirca.org.au | www.sirca.org.au 
Tel: +61 2 8355 2514 

Level 4, 55 Harrington St, The Rocks NSW 2000 
PO Box H58, Australia Square, Sydney NSW 1215 


</blockquote>


</blockquote>





-- 

Franc Carter | Systems architect | Sirca Ltd 

franc.carter@sirca.org.au | www.sirca.org.au 
Tel: +61 2 8355 2514 

Level 4, 55 Harrington St, The Rocks NSW 2000 
PO Box H58, Australia Square, Sydney NSW 1215 


</blockquote>


</blockquote>



-- 

Franc Carter | Systems architect | Sirca Ltd 

franc.carter@sirca.org.au | www.sirca.org.au 
Tel: +61 2 8355 2514 

Level 4, 55 Harrington St, The Rocks NSW 2000 
PO Box H58, Australia Square, Sydney NSW 1215 



Mime
View raw message