Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_BB572091-1574-4C98-A60E-6B40EF238851"
Message-Id: <2AE0C378-25F3-43AF-9BD4-D004178D931A@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: CL1 and CLQ with 5 nodes cluster and 3 alives node
Date: Tue, 23 Jul 2013 21:19:18 +1200
References: <1482135054.16794141374495311479.JavaMail.defaultUser@defaultHost>
 <CABsaHTO2ZFb=tT2KrOgW2UvX9zPO60pmP3JwO+q2OGzJpre41g@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CABsaHTO2ZFb=tT2KrOgW2UvX9zPO60pmP3JwO+q2OGzJpre41g@mail.gmail.com>


--Apple-Mail=_BB572091-1574-4C98-A60E-6B40EF238851
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

>> I really don't think I have more than 500 million rows ... any smart =
way to
>> count rows number inside the ks?
use the output from nodetool cfstats, it has a row count and bloom =
filter size for each CF.=20

You may also want to upgrade to 1.1 to get global cache management, that =
can make things easier to manage.=20

Cheers

-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/07/2013, at 6:26 AM, Nate McCall <zznate.m@gmail.com> wrote:

> Do you have a copy of the specific stack trace? Given the version and
> CL behavior, one thing you may be experiencing is:
> https://issues.apache.org/jira/browse/CASSANDRA-4578
>=20
> On Mon, Jul 22, 2013 at 7:15 AM, cbertu81@libero.it =
<cbertu81@libero.it> wrote:
>> Hi Aaron, thanks for your help.
>>=20
>>> If you have more than 500Million rows you may want to check the
>> bloom_filter_fp_chance, the old default was 0.000744 and the new =
(post 1.)
>> number is > 0.01 for sized tiered.
>>=20
>> I really don't think I have more than 500 million rows ... any smart =
way to
>> count rows number inside the ks?
>>=20
>>>> Now a question -- why with 2 nodes offline all my application stop
>> providing
>>>> the service, even when a Consistency Level One read is invoked?
>>=20
>>> What error did the client get and what client are you using ?
>>> it also depends on if/how the node fails. The later versions try to =
shut down
>> when there is an OOM, not sure what 1.0 does.
>>=20
>> The exception was a TTransportException -- I am using Pelops client.
>>=20
>>> Is the node went into a zombie state the clients may have been =
timing out.
>> The should then move onto to another node.
>>> If it had started shutting down the client should have gotten some =
immediate
>> errors.
>>=20
>> It didn't shut down, it was more like in a zombie state,
>> One more question: I'm experiencing some wrong counters (which are =
very
>> important in my platform since the are used to keep user-points and =
generate
>> the TopX users) --could it be related with this problem? The problem =
is that in
>> some users (not all) the counter column increased its value.
>>=20
>> After such a crash in 1.0 is there any best-practice to follow? =
(nodetool or
>> something?)
>>=20
>> Cheers,
>> Carlo
>>=20
>>>=20
>>> Cheers
>>>=20
>>>=20
>>> -----------------
>>> Aaron Morton
>>> Cassandra Consultant
>>> New Zealand
>>>=20
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>=20
>>> On 19/07/2013, at 5:02 PM, cbertu81@libero.it wrote:
>>>=20
>>>> Hi all,
>>>> I'm experiencing some problems after 3 years of cassandra in =
production
>> (from
>>>> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with =
OutOfMemory
>>>> Exception.
>>>> In the log I can read the warn about the few heap available ... now =
I'm
>>>> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and =
reducing
>> the
>>>> size of rows and memtables thresholds. Other tips?
>>>>=20
>>>> Now a question -- why with 2 nodes offline all my application stop
>> providing
>>>> the service, even when a Consistency Level One read is invoked?
>>>> I'd expected this behaviour:
>>>>=20
>>>> CL1 operations keep working
>>>> more than 80% of CLQ operations working (nodes offline where 2 and =
5 in a
>>>> clockwise key distribution only writes to fifth node should impact =
to node
>> 2)
>>>> most of all CLALL operations (that I don't use) failing
>>>>=20
>>>> The situation instead was that I had ALL services stop responding =
throwing
>> a
>>>> TTransportException ...
>>>>=20
>>>> Thanks in advance
>>>>=20
>>>> Carlo
>>>=20
>>>=20
>>=20
>>=20


--Apple-Mail=_BB572091-1574-4C98-A60E-6B40EF238851
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite"><blockquote type=3D"cite">I really don't =
think I have more than 500 million rows ... any smart way to<br>count =
rows number inside the ks?</blockquote></blockquote>use the output from =
nodetool cfstats, it has a row count and bloom filter size for each =
CF.&nbsp;<div><br></div><div>You may also want to upgrade to 1.1 to get =
global cache management, that can make things easier to =
manage.&nbsp;</div><div><br></div><div>Cheers</div><div><br><div><div =
apple-content-edited=3D"true">
<div style=3D"color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
medium; font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
-webkit-auto; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div =
style=3D"color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; =
font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
-webkit-auto; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div =
style=3D"color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; =
font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
-webkit-auto; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; =
border-spacing: 0px; "><div style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div>-----------------</div><div>Aaron Morton</div><div>Cassandra =
Consultant</div><div>New =
Zealand</div><div><br></div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></span></div></span></div></span></div></span></div></div></div>
</div>
<br><div><div>On 23/07/2013, at 6:26 AM, Nate McCall &lt;<a =
href=3D"mailto:zznate.m@gmail.com">zznate.m@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">Do you have a copy of the specific stack trace? Given the =
version and<br>CL behavior, one thing you may be experiencing is:<br><a =
href=3D"https://issues.apache.org/jira/browse/CASSANDRA-4578">https://issu=
es.apache.org/jira/browse/CASSANDRA-4578</a><br><br>On Mon, Jul 22, 2013 =
at 7:15 AM, cbertu81@libero.it &lt;cbertu81@libero.it&gt; =
wrote:<br><blockquote type=3D"cite">Hi Aaron, thanks for your =
help.<br><br><blockquote type=3D"cite">If you have more than 500Million =
rows you may want to check the<br></blockquote>bloom_filter_fp_chance, =
the old default was 0.000744 and the new (post 1.)<br>number is &gt; =
0.01 for sized tiered.<br><br>I really don't think I have more than 500 =
million rows ... any smart way to<br>count rows number inside the =
ks?<br><br><blockquote type=3D"cite"><blockquote type=3D"cite">Now a =
question -- why with 2 nodes offline all my application =
stop<br></blockquote></blockquote>providing<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">the service, even when a =
Consistency Level One read is =
invoked?<br></blockquote></blockquote><br><blockquote type=3D"cite">What =
error did the client get and what client are you using ?<br>it also =
depends on if/how the node fails. The later versions try to shut =
down<br></blockquote>when there is an OOM, not sure what 1.0 =
does.<br><br>The exception was a TTransportException -- I am using =
Pelops client.<br><br><blockquote type=3D"cite">Is the node went into a =
zombie state the clients may have been timing out.<br></blockquote>The =
should then move onto to another node.<br><blockquote type=3D"cite">If =
it had started shutting down the client should have gotten some =
immediate<br></blockquote>errors.<br><br>It didn't shut down, it was =
more like in a zombie state,<br>One more question: I'm experiencing some =
wrong counters (which are very<br>important in my platform since the are =
used to keep user-points and generate<br>the TopX users) --could it be =
related with this problem? The problem is that in<br>some users (not =
all) the counter column increased its value.<br><br>After such a crash =
in 1.0 is there any best-practice to follow? (nodetool =
or<br>something?)<br><br>Cheers,<br>Carlo<br><br><blockquote =
type=3D"cite"><br>Cheers<br><br><br>-----------------<br>Aaron =
Morton<br>Cassandra Consultant<br>New =
Zealand<br><br>@aaronmorton<br>http://www.thelastpickle.com<br><br>On =
19/07/2013, at 5:02 PM, cbertu81@libero.it wrote:<br><br><blockquote =
type=3D"cite">Hi all,<br>I'm experiencing some problems after 3 years of =
cassandra in =
production<br></blockquote></blockquote>(from<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">0.6 to 1.0.6) -- for 2 times in =
3 weeks 2 nodes crashed with OutOfMemory<br>Exception.<br>In the log I =
can read the warn about the few heap available ... now I'm<br>increasing =
a little bit my RAM, my Java Heap (1/4 of the RAM) and =
reducing<br></blockquote></blockquote>the<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">size of rows and memtables =
thresholds. Other tips?<br><br>Now a question -- why with 2 nodes =
offline all my application =
stop<br></blockquote></blockquote>providing<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">the service, even when a =
Consistency Level One read is invoked?<br>I'd expected this =
behaviour:<br><br>CL1 operations keep working<br>more than 80% of CLQ =
operations working (nodes offline where 2 and 5 in a<br>clockwise key =
distribution only writes to fifth node should impact to =
node<br></blockquote></blockquote>2)<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">most of all CLALL operations =
(that I don't use) failing<br><br>The situation instead was that I had =
ALL services stop responding =
throwing<br></blockquote></blockquote>a<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">TTransportException =
...<br><br>Thanks in =
advance<br><br>Carlo<br></blockquote><br><br></blockquote><br><br></blockq=
uote></blockquote></div><br></div></div></body></html>=

--Apple-Mail=_BB572091-1574-4C98-A60E-6B40EF238851--