incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: should I file a bug report on this or is this normal?
Date Thu, 07 Mar 2013 05:48:00 GMT
> but based on how the rows are spread through the sstable files?
It's per sstable. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 8:51 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

> Thanks for the great info, I will give it a go.
> 
> 1 question though, my false positive rate and number of rows is not changing so why is
the bloomfilter bigger?  Or do you mean bloomfilter is not based on number of rows int he
table but based on how the rows are spread through the sstable files?
> 
> Ie. I have the same amount of rows before and after in that specific column family.
> 
> 
> Thanks,
> Dean
> 
> From: aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Wednesday, March 6, 2013 9:29 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: should I file a bug report on this or is this normal?
> 
> 15. Size of nreldata is now 220K ….it has exploded in size!!!!!!
> This may be explained by fragmentation in the sstables, which compaction would eventually
resolve.
> 
> During repair the data came from multiple nodes and created multiple sstables for each
CF. Streaming copies part of an SSTable on the source and creates an SSTable on the destination.
This pattern is different to all writes for a CF going to the same sstable when flushed.
> 
> To compare apples to apples run a major compaction after the initial data load, and after
the repair.
> 
> 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large to small)
bytes of data while in the initial data it was 2192 bytes for 43038(small to large) bytes
of data?
> The size of the BF depends on the number of rows and the false positive rate. Not the
size of the -Data.db component on disk.
> 
> 2.  Why is there 3 levels?  With such a small set of data, I would think it would flush
one data file like the original data but instead there is 3 files.
> See above.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6/03/2013, at 6:40 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
> 
> I ran a pretty solid QA test(cleaned data from scratch) on version 1.2.2
> 
> My test was as so
> 
> 1.  Start up 4 node cassandra cluster
> 2.  Populate with initial test data (no other data is added to system after this point!!!)
> 3.  Run nodetool drain on every node(move stuff from commit log to sstables)
> 4.  Stop and start cassandra cluster to have it running again
> 5.  Get size of nreldata CF folder is 128kB
> 6.  Go to node 3, run snapshot and mv snapshots directory OUT of nreldata
> 7.  Get size of nreldata CF folder is 128kB
> 8.  On node 3, run nodetool drain
> 9.  Get size of nreldataCF folder is still 128kB
> 10. Stop cassandra node
> 11. Rm <keyspace>/nreldata/*.db
> 12. Size of nreldata CF is 8kb(odd of an empty folder but ok)
> 13. Start cassandra
> 14. Nodetool repair databus5 nreldata
> 15. Size of nreldata is now 220K ….it has exploded in size!!!!!!
> 
> I ran this QA test as we see data size explosion in production as well(I can't be 100%
sure if this is the same thing though as above is such a small data set).  Would leveled compaction
be a bit more stable in terms of size ratios and such.
> 
> QUESTIONS
> 
> 1.  Why is the bloomfilter for level 5 a total of 3856 bytes for 29118(large to small)
bytes of data while in the initial data it was 2192 bytes for 43038(small to large) bytes
of data?
> 2.  Why is there 3 levels?  With such a small set of data, I would think it would flush
one data file like the original data but instead there is 3 files.
> 
> My files after repair have levels 5, 6, and 7.  My files before deletion of the CF have
just level 1.  After repair files are
> -rw-rw-r--.  1 cassandra cassandra    54 Mar  6 07:18 databus5-nreldata-ib-5-CompressionInfo.db
> -rw-rw-r--.  1 cassandra cassandra 29118 Mar  6 07:18 databus5-nreldata-ib-5-Data.db
> -rw-rw-r--.  1 cassandra cassandra  3856 Mar  6 07:18 databus5-nreldata-ib-5-Filter.db
> -rw-rw-r--.  1 cassandra cassandra 37000 Mar  6 07:18 databus5-nreldata-ib-5-Index.db
> -rw-rw-r--.  1 cassandra cassandra  4772 Mar  6 07:18 databus5-nreldata-ib-5-Statistics.db
> -rw-rw-r--.  1 cassandra cassandra   383 Mar  6 07:18 databus5-nreldata-ib-5-Summary.db
> -rw-rw-r--.  1 cassandra cassandra    79 Mar  6 07:18 databus5-nreldata-ib-5-TOC.txt
> -rw-rw-r--.  1 cassandra cassandra    46 Mar  6 07:18 databus5-nreldata-ib-6-CompressionInfo.db
> -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 databus5-nreldata-ib-6-Data.db
> -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 databus5-nreldata-ib-6-Filter.db
> -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 databus5-nreldata-ib-6-Index.db
> -rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 databus5-nreldata-ib-6-Statistics.db
> -rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 databus5-nreldata-ib-6-Summary.db
> -rw-rw-r--.  1 cassandra cassandra    79 Mar  6 07:18 databus5-nreldata-ib-6-TOC.txt
> -rw-rw-r--.  1 cassandra cassandra    46 Mar  6 07:18 databus5-nreldata-ib-7-CompressionInfo.db
> -rw-rw-r--.  1 cassandra cassandra 14271 Mar  6 07:18 databus5-nreldata-ib-7-Data.db
> -rw-rw-r--.  1 cassandra cassandra   816 Mar  6 07:18 databus5-nreldata-ib-7-Filter.db
> -rw-rw-r--.  1 cassandra cassandra 18248 Mar  6 07:18 databus5-nreldata-ib-7-Index.db
> -rw-rw-r--.  1 cassandra cassandra  4756 Mar  6 07:18 databus5-nreldata-ib-7-Statistics.db
> -rw-rw-r--.  1 cassandra cassandra   230 Mar  6 07:18 databus5-nreldata-ib-7-Summary.db
> -rw-rw-r--.  1 cassandra cassandra    79 Mar  6 07:18 databus5-nreldata-ib-7-TOC.txt
> 
> Before repair files(from my moved snapshot as I moved it out of the directory so cassandra
no longer had it)….
> -rw-rw-r--. 1 cassandra cassandra    62 Mar  6 07:11 databus5-nreldata-ib-1-CompressionInfo.db
> -rw-rw-r--. 1 cassandra cassandra 43038 Mar  6 07:11 databus5-nreldata-ib-1-Data.db
> -rw-rw-r--. 1 cassandra cassandra  2192 Mar  6 07:11 databus5-nreldata-ib-1-Filter.db
> -rw-rw-r--. 1 cassandra cassandra 55248 Mar  6 07:11 databus5-nreldata-ib-1-Index.db
> -rw-rw-r--. 1 cassandra cassandra  4756 Mar  6 07:11 databus5-nreldata-ib-1-Statistics.db
> -rw-rw-r--. 1 cassandra cassandra   499 Mar  6 07:11 databus5-nreldata-ib-1-Summary.db
> -rw-rw-r--. 1 cassandra cassandra    79 Mar  6 07:11 databus5-nreldata-ib-1-TOC.txt
> 
> Thanks,
> Dean
> 
> 


Mime
View raw message