Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates
 74.125.83.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=e5VPRKGQF5OXRcGmyQsItLv4aMMB0QhrCCKe9VvXbNFHdh0+BLU8xW5xqScAsHr20H
         KOpcQlzvPweTw9aQ3dagb/v6pp8FAXNi+AgsNrjzEKI2ne915eWbj9MgDyasrCx9Y4l1
         bsBEiV50/HainLfTm21Bu23AVxNDPOaw6qu7k=
MIME-Version: 1.0
In-Reply-To: <AANLkTiloMBG1b0K7Anr-8GBBGDyGZReV0DWU5hognA7M@mail.gmail.com>
References: <AANLkTik2tNG_G-xGJHK6Z7jH2SSZmggwdxOnjMe7mwEj@mail.gmail.com>
	<AANLkTineJmXJzFEaBYrdjlHeieOKjzK1QzYcmygGBpJb@mail.gmail.com>
	<AANLkTinYIO6GUYeXDPM5I869u3ITOH5cnPhi6rfcuEY3@mail.gmail.com>
	<AANLkTimWEIGIQmYFEoaEhb6QmCS8oyWJOvSF2dZfsfaK@mail.gmail.com>
	<AANLkTim68KfAAM34wmQj99oJsHWkcB3n9mhoUKWbrjmd@mail.gmail.com>
	<AANLkTiloMBG1b0K7Anr-8GBBGDyGZReV0DWU5hognA7M@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Fri, 18 Jun 2010 20:16:11 -0700
Message-ID: <AANLkTimZ4Ve7ZVousZRLwSxG7jWYnTty8G-21OewGHcS@mail.gmail.com>
Subject: Re: Occasional 10s Timeouts on Read
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

set log level to TRACE and see if the OutboundTcpConnection is going
bad.  that would explain the message never arriving.

On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater <aj@zuno.com> wrote:
> To summarize:
>
> If a request for a column comes in *after a period of several hours
> with no requests*, then the node servicing the request hangs while
> looking for its peer rather than servicing the request like it should.
> It then throws either a TimedOutException or a (wrong)
> NotFoundExeption.
>
> And it doen't appear to actually send the message it says it does to
> its peer. Or at least its peer doesn't report the request being
> received.
>
> And then the situation magically clears up after approximately 2 minutes.
>
> However, if the idle period never occurs, then the problem does not
> manifest. If I run a cron job with wget against my server every
> minute, I do not see the problem.
>
> I'll be looking at some tcpdump logs to see if i can suss out what's
> really happening, and perhaps file this as a bug. The several hours
> between reproducible events makes this whole thing aggravating for
> detection, debugging and I'll assume, fixing, if it is indeed a
> cassandra problem.
>
> It was suggested on IRC that it may be my network. But gossip is
> continually sending heartbeats and nodetool and the logs show the
> nodes as up and available. If my network was flaking out I'd think it
> would be dropping heartbeats and I'd see that.
>
> AJ
>
> On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater <aj@zuno.com> wrote:
>> These are physical machines.
>>
>> storage-conf.xml.fs03 is here:
>>
>> http://pastebin.com/weL41NB1
>>
>> Diffs from that for the other two storage-confs are inline here:
>>
>> aj@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>> storage-conf.xml.fs01
>> 185c185
>>
>>> =A0 <InitialToken>71603818521973537678586548668074777838</InitialToken>
>> 229c229
>> < =A0 <ListenAddress>10.33.2.70</ListenAddress>
>> ---
>>> =A0 <ListenAddress>10.33.3.10</ListenAddress>
>> 241c241
>> < =A0 <ThriftAddress>10.33.2.70</ThriftAddress>
>> ---
>>> =A0 <ThriftAddress>10.33.3.10</ThriftAddress>
>> 341c341
>> < =A0 <ConcurrentReads>16</ConcurrentReads>
>> ---
>>> =A0 <ConcurrentReads>4</ConcurrentReads>
>>
>>
>> aj@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>> storage-conf.xml.fs02
>> 185c185
>> < =A0 <InitialToken>0</InitialToken>
>> ---
>>> =A0 <InitialToken>120215585224964746744782921158327379306</InitialToken=
>
>> 206d205
>> < =A0 =A0 =A0 <Seed>10.33.3.20</Seed>
>> 229c228
>> < =A0 <ListenAddress>10.33.2.70</ListenAddress>
>> ---
>>> =A0 <ListenAddress>10.33.3.20</ListenAddress>
>> 241c240
>> < =A0 <ThriftAddress>10.33.2.70</ThriftAddress>
>> ---
>>> =A0 <ThriftAddress>10.33.3.20</ThriftAddress>
>> 341c340
>> < =A0 <ConcurrentReads>16</ConcurrentReads>
>> ---
>>> =A0 <ConcurrentReads>4</ConcurrentReads>
>>
>>
>> Thank you for your attention,
>>
>> AJ
>>
>>
>> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black <b@b3k.us> wrote:
>>> Are these physical machines or virtuals? =A0Did you post your
>>> cassandra.in.sh and storage-conf.xml someplace?
>>>
>>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater <aj@zuno.com> wrote:
>>>> Total data size in the entire cluster is about twenty 12k images. With
>>>> no other load on the system. I just ask for one column and I get these
>>>> timeouts. Performing multiple gets on the columns leads to multiple
>>>> timeouts for a period of a few seconds or minutes and then the
>>>> situation magically resolves itself and response times are down to
>>>> single digit milliseconds for a column get.
>>>>
>>>> On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater <aj@zuno.com> wrote:
>>>>> Cassandra 0.6.2 from the apache debian source.
>>>>> Ubunutu Jaunty. Sun Java6 jvm.
>>>>>
>>>>> All nodes in separate racks at 365 main.
>>>>>
>>>>> On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater <aj@zuno.com> wrote:
>>>>>> I'm seing 10s timeouts on reads few times a day. Its hard to reprodu=
ce
>>>>>> consistently but seems to happen most often after its been a long ti=
me
>>>>>> between reads. After presenting itself for a couple minutes the
>>>>>> problem then goes away.
>>>>>>
>>>>>> I've got a three node cluster with replication factor 2, reading at
>>>>>> consistency level ONE. The columns being read are around 12k each. T=
he
>>>>>> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6G=
B.
>>>>>>
>>>>>> Here's an application log from early this morning when a developer i=
n
>>>>>> Belgrade accessed the system:
>>>>>>
>>>>>> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>> Requested image_id: 5827067133c3d670071c17d9144f0b49
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 |
>>>>>> TimedOutException for Image 5827067133c3d670071c17d9144f0b49
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>> Get took 10005.388975 ms
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>>>>>> Requested image_id: af8caf3b76ce97d13812ddf795104a5c
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>> Get took 3.658056 ms
>>>>>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>>>>>> Transform took 0.978947 ms
>>>>>>
>>>>>> That's a Timeout and then a successful get of another column.
>>>>>>
>>>>>> Here's the cassandra log for 10.33.2.70:
>>>>>>
>>>>>> DEBUG 03:54:17,070 get_slice
>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>> SliceFromReadCommand(table=3D'jolitics.com',
>>>>>> key=3D'5827067133c3d670071c17d9144f0b49',
>>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'Images',
>>>>>> superColumnName=3D'null', columnName=3D'null')', start=3D'', finish=
=3D'
>>>>>> ', reversed=3Dfalse, count=3D100)
>>>>>> DEBUG 03:54:17,071 weakreadremote reading
>>>>>> SliceFromReadCommand(table=3D'jolitics.com',
>>>>>> key=3D'5827067133c3d670071c17d9144f0b49',
>>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'Images',
>>>>>> superColumnName=3D'null', columnName=3D'null')', start=3D'', finish=
=3D'
>>>>>> ', reversed=3Dfalse, count=3D100) from 45138@/10.33.3.10
>>>>>> DEBUG 03:54:27,077 get_slice
>>>>>> DEBUG 03:54:27,078 weakreadlocal reading
>>>>>> SliceFromReadCommand(table=3D'jolitics.com',
>>>>>> key=3D'af8caf3b76ce97d13812ddf795104a5c',
>>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'Images',
>>>>>> superColumnName=3D'null', columnName=3D'null')', start=3D'', finish=
=3D''
>>>>>> , reversed=3Dfalse, count=3D100)
>>>>>> DEBUG 03:54:27,079 collecting body:false:1610@1275951327610885
>>>>>> DEBUG 03:54:27,080 collecting body:false:1610@1275951327610885
>>>>>> DEBUG 03:54:27,080 Reading consistency digest for af8caf3b76ce97d138=
12ddf795104a
>>>>>> 5c from 45168@[/10.33.2.70, /10.33.3.10]
>>>>>> DEBUG 03:54:50,779 Disseminating load info ...
>>>>>>
>>>>>> It looks like it asks for key=3D'5827067133c3d670071c17d9144f0b49' f=
rom
>>>>>> the local host and also queries 10.33.3.10 for the first one and the=
n
>>>>>> for 'af8caf3b76ce97d13812ddf795104a5c' it only queries the local hos=
t
>>>>>> and then returns appropriately.
>>>>>>
>>>>>> Here's the log for 10.33.3.10 around that time:
>>>>>>
>>>>>> DEBUG 03:54:19,645 Disseminating load info ...
>>>>>> DEBUG 03:55:19,645 Disseminating load info ...
>>>>>> DEBUG 03:56:19,646 Disseminating load info ...
>>>>>> DEBUG 03:57:19,645 Disseminating load info ...
>>>>>> DEBUG 03:58:19,645 Disseminating load info ...
>>>>>> DEBUG 03:59:19,646 Disseminating load info ...
>>>>>> DEBUG 04:00:18,635 GC for ParNew: 4 ms, 21443128 reclaimed leaving
>>>>>> 55875144 used; max is 6580535296
>>>>>>
>>>>>> No record of communication from 10.33.2.70.
>>>>>>
>>>>>> Does this ring any bells for anyone? I can of course attach
>>>>>> storage-conf's for all nodes if that sounds useful and I'll be on
>>>>>> #cassandra as ajslater.
>>>>>>
>>>>>> Much thanks for taking a look and any suggestions. We fear we'll hav=
e
>>>>>> to abandon Cassandra if this bug cannot be resolved.
>>>>>>
>>>>>> AJ
>>>>>>
>>>>>
>>>>
>>>
>>
>


--=20
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com