Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com
 designates 209.85.192.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <FE84AE7AAE9A2B4EA73512EED142BEF830CB5604@ATL02MB02.corp.local>
References: <514AC13C.4020102@opera.com>
	<FE84AE7AAE9A2B4EA73512EED142BEF830CB5604@ATL02MB02.corp.local>
Date: Thu, 21 Mar 2013 10:10:19 +0100
Message-ID: 
 <CAKkz8Q3XAgAZcaM8KXr94RT5RBoN22raCSMJ8SDaoi4ktyjwFA@mail.gmail.com>
Subject: Re: Cassandra freezes
From: Sylvain Lebresne <sylvain@datastax.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=047d7b6dcedc28846904d86bb5aa

--047d7b6dcedc28846904d86bb5aa
Content-Type: text/plain; charset=ISO-8859-1

Prior to 1.2 the index summaries were not saved on disk, and were thus
computed on startup while the sstable was loaded. In 1.2 they now are saved
on disk to make startup faster (
https://issues.apache.org/jira/browse/CASSANDRA-2392). That being said, if
the index_interval value used by a summary saved doesn't match the current
one while the sstable is loaded, the summary is recomputed anyway, so
restarting a node should always take a new index_interval setting into
account.

--
Sylvain


On Thu, Mar 21, 2013 at 9:43 AM, Andras Szerdahelyi <
andras.szerdahelyi@ignitionone.com> wrote:

> I can not find the reference that notes having to upgradesstables when you
> change this. I really hope such complex assumptions are not formulating in
> my head just on their own and there actually exists some kind of reliable
> reference that clears this up :-) but,
>
> # index_interval controls the sampling of entries from the primrary
> # row index in terms of space versus time. The larger the interval,
> # the smaller and less effective the sampling will be. In technicial
> # terms, the interval coresponds to the number of index entries that
> # are skipped between taking each sample. All the sampled entries
> # must fit in memory. Generally, a value between 128 and 512 here
> # coupled with a large key cache size on CFs results in the best trade
> # offs. This value is not often changed, however if you have many
> # very small rows (many to an OS page), then increasing this will
> # often lower memory usage without a impact on performance.
>
> it is ( very ) safe to assume the row index is re-built/updated when new
> sstables are built.
> Obviously the sample of this index will have to follow this process very
> closely.
>
> It is possible however that the sample itself is not persisted and is
> built at startup as opposed to *only* when the index changes.( which is
> what I thought was happening )
> It shouldn't be too difficult to verify this, but I'd appreciate if
> someone who looked at this before could confirm if this is the case.
>
> Thanks,
> Andras
>
> On 21/03/13 09:13, "Michal Michalski" <michalm@opera.com> wrote:
>
> >About index_interval:
> >
> >> 1) you have to rebuild stables ( not an issue if you are evaluating,
> >>doing
> >> test writes.. Etc, not so much in production )
> >
> >Are you sure of this? As I understand indexes, it's not required because
> >this parameter defines an interval of in-memory index sample, which is
> >created during C* startup basing on a primary on-disk index file. The
> >fact that Heap usage is reduced immediately after C* restart seem to
> >confirm this, but maybe I miss something?
> >
> >M.
>
>

--047d7b6dcedc28846904d86bb5aa
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Prior to 1.2 the index summaries were not saved on disk, a=
nd were thus computed on startup while the sstable was loaded. In 1.2 they =
now are saved on disk to make startup faster (<a href=3D"https://issues.apa=
che.org/jira/browse/CASSANDRA-2392">https://issues.apache.org/jira/browse/C=
ASSANDRA-2392</a>). That being said, if the index_interval value used by a =
summary saved doesn&#39;t match the current one while the sstable is loaded=
, the summary is recomputed anyway, so restarting a node should always take=
 a new index_interval setting into account.<div>
<br></div><div style>--</div><div style>Sylvain</div></div><div class=3D"gm=
ail_extra"><br><br><div class=3D"gmail_quote">On Thu, Mar 21, 2013 at 9:43 =
AM, Andras Szerdahelyi <span dir=3D"ltr">&lt;<a href=3D"mailto:andras.szerd=
ahelyi@ignitionone.com" target=3D"_blank">andras.szerdahelyi@ignitionone.co=
m</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I can not find the reference that notes havi=
ng to upgradesstables when you<br>
change this. I really hope such complex assumptions are not formulating in<=
br>
my head just on their own and there actually exists some kind of reliable<b=
r>
reference that clears this up :-) but,<br>
<br>
# index_interval controls the sampling of entries from the primrary<br>
# row index in terms of space versus time. The larger the interval,<br>
# the smaller and less effective the sampling will be. In technicial<br>
# terms, the interval coresponds to the number of index entries that<br>
# are skipped between taking each sample. All the sampled entries<br>
# must fit in memory. Generally, a value between 128 and 512 here<br>
# coupled with a large key cache size on CFs results in the best trade<br>
# offs. This value is not often changed, however if you have many<br>
# very small rows (many to an OS page), then increasing this will<br>
# often lower memory usage without a impact on performance.<br>
<br>
it is ( very ) safe to assume the row index is re-built/updated when new<br=
>
sstables are built.<br>
Obviously the sample of this index will have to follow this process very<br=
>
closely.<br>
<br>
It is possible however that the sample itself is not persisted and is<br>
built at startup as opposed to *only* when the index changes.( which is<br>
what I thought was happening )<br>
It shouldn&#39;t be too difficult to verify this, but I&#39;d appreciate if=
<br>
someone who looked at this before could confirm if this is the case.<br>
<br>
Thanks,<br>
Andras<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On 21/03/13 09:13, &quot;Michal Michalski&quot; &lt;<a href=3D"mailto:micha=
lm@opera.com">michalm@opera.com</a>&gt; wrote:<br>
<br>
&gt;About index_interval:<br>
&gt;<br>
&gt;&gt; 1) you have to rebuild stables ( not an issue if you are evaluatin=
g,<br>
&gt;&gt;doing<br>
&gt;&gt; test writes.. Etc, not so much in production )<br>
&gt;<br>
&gt;Are you sure of this? As I understand indexes, it&#39;s not required be=
cause<br>
&gt;this parameter defines an interval of in-memory index sample, which is<=
br>
&gt;created during C* startup basing on a primary on-disk index file. The<b=
r>
&gt;fact that Heap usage is reduced immediately after C* restart seem to<br=
>
&gt;confirm this, but maybe I miss something?<br>
&gt;<br>
&gt;M.<br>
<br>
</div></div></blockquote></div><br></div>

--047d7b6dcedc28846904d86bb5aa--