Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2321EFF82 for ; Thu, 21 Mar 2013 09:12:46 +0000 (UTC) Received: (qmail 4351 invoked by uid 500); 21 Mar 2013 09:11:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 3905 invoked by uid 500); 21 Mar 2013 09:11:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 3022 invoked by uid 99); 21 Mar 2013 09:11:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 09:11:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.192.172 as permitted sender) Received: from [209.85.192.172] (HELO mail-pd0-f172.google.com) (209.85.192.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 09:10:44 +0000 Received: by mail-pd0-f172.google.com with SMTP id w10so1007401pde.3 for ; Thu, 21 Mar 2013 02:10:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=JQE+YcnURBbTcTTSx6SDGE4surzs1DiepZXqAGIN/e0=; b=HKmr2G66pARoKYZ94CAUaJfrKelTcDqDs5YtvKpfYbbCkqPgjajgZIkxrgTL8T4o7C k+zDif/RWV2oUI2wFIvjjhB41Za5EKwMHYbuVEeijdzkeu2e4YTy/XvH7ve1N1eIWdo4 HNWxNCXtQ7mO++RidXAk5TbAxjjSENX9TIVx9cYtIeIaUGaWWKGE8wBDUna0rPudzNQ5 z1c+qjVB/Xl5oY+yxh+Xfa5o3gC1KKdccv9EvqdIBkO1m0v/p+2p//CBuuzZ+9tPzRKV kF+D6Q9+55SHKPhIh5F7DpSdNmWFRE+8c7Yqyo/STZ4jEAuTOZR9bKm0UNlOHiZv89Y7 3Ecg== MIME-Version: 1.0 X-Received: by 10.66.144.69 with SMTP id sk5mr3270384pab.69.1363857019256; Thu, 21 Mar 2013 02:10:19 -0700 (PDT) Received: by 10.68.223.197 with HTTP; Thu, 21 Mar 2013 02:10:19 -0700 (PDT) In-Reply-To: References: <514AC13C.4020102@opera.com> Date: Thu, 21 Mar 2013 10:10:19 +0100 Message-ID: Subject: Re: Cassandra freezes From: Sylvain Lebresne To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=047d7b6dcedc28846904d86bb5aa X-Gm-Message-State: ALoCoQmaNC9h3SJpAQD5SJh/7/6mu5YY+DoF17xlX+BhT0/zxTdYI3CfhtGmpaWMfaHvXrqPx7n0 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6dcedc28846904d86bb5aa Content-Type: text/plain; charset=ISO-8859-1 Prior to 1.2 the index summaries were not saved on disk, and were thus computed on startup while the sstable was loaded. In 1.2 they now are saved on disk to make startup faster ( https://issues.apache.org/jira/browse/CASSANDRA-2392). That being said, if the index_interval value used by a summary saved doesn't match the current one while the sstable is loaded, the summary is recomputed anyway, so restarting a node should always take a new index_interval setting into account. -- Sylvain On Thu, Mar 21, 2013 at 9:43 AM, Andras Szerdahelyi < andras.szerdahelyi@ignitionone.com> wrote: > I can not find the reference that notes having to upgradesstables when you > change this. I really hope such complex assumptions are not formulating in > my head just on their own and there actually exists some kind of reliable > reference that clears this up :-) but, > > # index_interval controls the sampling of entries from the primrary > # row index in terms of space versus time. The larger the interval, > # the smaller and less effective the sampling will be. In technicial > # terms, the interval coresponds to the number of index entries that > # are skipped between taking each sample. All the sampled entries > # must fit in memory. Generally, a value between 128 and 512 here > # coupled with a large key cache size on CFs results in the best trade > # offs. This value is not often changed, however if you have many > # very small rows (many to an OS page), then increasing this will > # often lower memory usage without a impact on performance. > > it is ( very ) safe to assume the row index is re-built/updated when new > sstables are built. > Obviously the sample of this index will have to follow this process very > closely. > > It is possible however that the sample itself is not persisted and is > built at startup as opposed to *only* when the index changes.( which is > what I thought was happening ) > It shouldn't be too difficult to verify this, but I'd appreciate if > someone who looked at this before could confirm if this is the case. > > Thanks, > Andras > > On 21/03/13 09:13, "Michal Michalski" wrote: > > >About index_interval: > > > >> 1) you have to rebuild stables ( not an issue if you are evaluating, > >>doing > >> test writes.. Etc, not so much in production ) > > > >Are you sure of this? As I understand indexes, it's not required because > >this parameter defines an interval of in-memory index sample, which is > >created during C* startup basing on a primary on-disk index file. The > >fact that Heap usage is reduced immediately after C* restart seem to > >confirm this, but maybe I miss something? > > > >M. > > --047d7b6dcedc28846904d86bb5aa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Prior to 1.2 the index summaries were not saved on disk, a= nd were thus computed on startup while the sstable was loaded. In 1.2 they = now are saved on disk to make startup faster (https://issues.apache.org/jira/browse/C= ASSANDRA-2392). That being said, if the index_interval value used by a = summary saved doesn't match the current one while the sstable is loaded= , the summary is recomputed anyway, so restarting a node should always take= a new index_interval setting into account.

--
Sylvain


On Thu, Mar 21, 2013 at 9:43 = AM, Andras Szerdahelyi <andras.szerdahelyi@ignitionone.co= m> wrote:
I can not find the reference that notes havi= ng to upgradesstables when you
change this. I really hope such complex assumptions are not formulating in<= br> my head just on their own and there actually exists some kind of reliable reference that clears this up :-) but,

# index_interval controls the sampling of entries from the primrary
# row index in terms of space versus time. The larger the interval,
# the smaller and less effective the sampling will be. In technicial
# terms, the interval coresponds to the number of index entries that
# are skipped between taking each sample. All the sampled entries
# must fit in memory. Generally, a value between 128 and 512 here
# coupled with a large key cache size on CFs results in the best trade
# offs. This value is not often changed, however if you have many
# very small rows (many to an OS page), then increasing this will
# often lower memory usage without a impact on performance.

it is ( very ) safe to assume the row index is re-built/updated when new sstables are built.
Obviously the sample of this index will have to follow this process very closely.

It is possible however that the sample itself is not persisted and is
built at startup as opposed to *only* when the index changes.( which is
what I thought was happening )
It shouldn't be too difficult to verify this, but I'd appreciate if=
someone who looked at this before could confirm if this is the case.

Thanks,
Andras

On 21/03/13 09:13, "Michal Michalski" <michalm@opera.com> wrote:

>About index_interval:
>
>> 1) you have to rebuild stables ( not an issue if you are evaluatin= g,
>>doing
>> test writes.. Etc, not so much in production )
>
>Are you sure of this? As I understand indexes, it's not required be= cause
>this parameter defines an interval of in-memory index sample, which is<= br> >created during C* startup basing on a primary on-disk index file. The >fact that Heap usage is reduced immediately after C* restart seem to >confirm this, but maybe I miss something?
>
>M.


--047d7b6dcedc28846904d86bb5aa--