Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 17200 invoked from network); 3 Apr 2011 14:40:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Apr 2011 14:40:42 -0000 Received: (qmail 52822 invoked by uid 500); 3 Apr 2011 14:40:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52775 invoked by uid 500); 3 Apr 2011 14:40:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52767 invoked by uid 99); 3 Apr 2011 14:40:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Apr 2011 14:40:40 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [66.33.216.122] (HELO hapkido.dreamhost.com) (66.33.216.122) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Apr 2011 14:40:31 +0000 Received: from homiemail-a55.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by hapkido.dreamhost.com (Postfix) with ESMTP id 841E417B978 for ; Sun, 3 Apr 2011 07:40:09 -0700 (PDT) Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 74D1012C058 for ; Sun, 3 Apr 2011 07:39:57 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=PmUacw+7jv XB4jb0xe7+oF2bctqcsGnAqKpW3swNjUXV6r99irwnWamcc46q2GL0JS1tR8MpP5 Uc24zl/yA0g5bs5K54x6bsTC261sxWT9v0LqbMlqBZ5fX40VkjZU/8k7AkxPdQ7Y jGygh2Pf3jyiSOud4Zl9JRsglFmAZBJyo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=L6d8akug/q/VtK/M f2EHXEyGPAQ=; b=WVljw/CT4AT4mr93rtV1bo4CYvOUtAvOR03ZkkIxkXZglhwM h6yfZH1rTxZtTuwPklzZ2CErBV1jt2npQbHfKIncsyZ1ZD/XfeKXgikrlWrGVyUq HsjlcVdhaqs86LWGSyjCak1GDemUw9kxxDviqreyH4rmnpPietNuOQDLfi0= Received: from [192.168.0.104] (CPE-58-175-240-72.hdcz1.win.bigpond.net.au [58.175.240.72]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id 9D6A012C047 for ; Sun, 3 Apr 2011 07:39:56 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1082.1) Content-Type: multipart/alternative; boundary=Apple-Mail-58--735636716 Subject: Re: Endless minor compactions after heavy inserts Date: Mon, 4 Apr 2011 00:39:53 +1000 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <066E0041-CBAD-4C7D-8BF0-2D8AEB89002F@thelastpickle.com> X-Mailer: Apple Mail (2.1082.1) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-58--735636716 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii With only one data file your reads would use the least amount of IO to = find the data.=20 Most people have multiple nodes and probably fewer disks, so each node = may have a TB or two of data. How much capacity do your 10 disks give ? = Will you be running multiple nodes in production ? Aaron =20 On 2 Apr 2011, at 12:45, Sheng Chen wrote: > Thank you very much. >=20 > The major compaction will merge everything into one big file., which = would be very large. > Is there any way to control the number or size of files created by = major compaction? > Or, is there a recommended number or size of files for cassandra to = handle? >=20 > Thanks. I see the trigger of my minor compaction is = OperationsInMillions. It is a number of operations in total, which I = thought was in a second. >=20 > Cheers, > Sheng >=20 >=20 > 2011/4/1 aaron morton > If you are doing some sort of bulk load you can disable minor = compactions by setting the min_compaction_threshold and = max_compaction_threshold to 0 . Then once your insert is complete run a = major compaction via nodetool before turning the minor compaction back = on. >=20 > You can also reduce the compaction threads priority, see = compaction_thread_priority in the yaml file. >=20 > The memtable will be flushed when either the MB or ops throughput is = triggered. If you are seeing a lot of memtables smaller than the MB = threshold then the ops threshold is probably been triggered. Look for a = log message at INFO level starting with "Enqueuing flush of Memtable" = that will tell you how many bytes and ops the memtable had when it was = flushed. Trying increasing the ops threshold and see what happens. >=20 > You're change in the compaction threshold may not have an an effect = because the compaction process was already running. >=20 > AFAIK the best way to get the best out of your 10 disks will be to use = a dedicated mirror for the commit log and a stripe set for the data. >=20 > Hope that helps. > Aaron >=20 > On 1 Apr 2011, at 14:52, Sheng Chen wrote: >=20 > > I've got a single node of cassandra 0.7.4, and I used the java = stress tool to insert about 100 million records. > > The inserts took about 6 hours (45k inserts/sec) but the following = minor compactions last for 2 days and the pending compaction jobs are = still increasing. > > > > =46rom jconsole I can read the MemtableThroughputInMB=3D1499, = MemtableOperationsInMillions=3D7.0 > > But in my data directory, I got hundreds of 438MB data files, which = should be the cause of the minor compactions. > > > > I tried to set compaction threshold by nodetool, but it didn't seem = to take effects (no change in pending compaction tasks). > > After restarting the node, my setting is lost. > > > > I want to distribute the read load in my disks (10 disks in xfs, = LVM), so I don't want to do a major compaction. > > So, what can I do to keep the sstable file in a reasonable size, or = to make the minor compactions faster? > > > > Thank you in advance. > > Sheng > > >=20 >=20 --Apple-Mail-58--735636716 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii With only one data file your reads would use the least amount of IO to find the data. 

Most people have multiple nodes and probably fewer disks, so each node may have a TB or two of data. How much capacity do your 10 disks give ? Will you be running multiple nodes in production ?

Aaron


 
On 2 Apr 2011, at 12:45, Sheng Chen wrote:

Thank you very much.

The major compaction will merge everything into one big file., which would be very large.
Is there any way to control the number or size of files created by major compaction?
Or, is there a recommended number or size of files for cassandra to handle?

Thanks. I see the trigger of my minor compaction is OperationsInMillions. It is a number of operations in total, which I thought was in a second.

Cheers,
Sheng


2011/4/1 aaron morton <aaron@thelastpickle.com>
If you are doing some sort of bulk load you can disable minor compactions by setting the min_compaction_threshold and max_compaction_threshold to 0 . Then once your insert is complete run a major compaction via nodetool before turning the minor compaction back on.

You can also reduce the compaction threads priority, see compaction_thread_priority in the yaml file.

The memtable will be flushed when either the MB or ops throughput is triggered. If you are seeing a lot of memtables smaller than the MB threshold then the ops threshold is probably been triggered. Look for a log message at INFO level starting with "Enqueuing flush of Memtable" that will tell you how many bytes and ops the memtable had when it was flushed. Trying increasing the ops threshold and see what happens.

You're change in the compaction threshold may not have an an effect because the compaction process was already running.

AFAIK the best way to get the best out of your 10 disks will be to use a dedicated mirror for the commit log and a  stripe set for the data.

Hope that helps.
Aaron

On 1 Apr 2011, at 14:52, Sheng Chen wrote:

> I've got a single node of cassandra 0.7.4, and I used the java stress tool to insert about 100 million records.
> The inserts took about 6 hours (45k inserts/sec) but the following minor compactions last for 2 days and the pending compaction jobs are still increasing.
>
> From jconsole I can read the MemtableThroughputInMB=1499, MemtableOperationsInMillions=7.0
> But in my data directory, I got hundreds of 438MB data files, which should be the cause of the minor compactions.
>
> I tried to set compaction threshold by nodetool, but it didn't seem to take effects (no change in pending compaction tasks).
> After restarting the node, my setting is lost.
>
> I want to distribute the read load in my disks (10 disks in xfs, LVM), so I don't want to do a major compaction.
> So, what can I do to keep the sstable file in a reasonable size, or to make the minor compactions faster?
>
> Thank you in advance.
> Sheng
>



--Apple-Mail-58--735636716--