From cassandra-user-return-1003-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Thu Oct 22 17:06:15 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 92127 invoked from network); 22 Oct 2009 17:06:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Oct 2009 17:06:15 -0000 Received: (qmail 41450 invoked by uid 500); 22 Oct 2009 17:06:15 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 41413 invoked by uid 500); 22 Oct 2009 17:06:15 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 41404 invoked by uid 99); 22 Oct 2009 17:06:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2009 17:06:15 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.219.205 as permitted sender) Received: from [209.85.219.205] (HELO mail-ew0-f205.google.com) (209.85.219.205) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2009 17:06:12 +0000 Received: by ewy1 with SMTP id 1so6900127ewy.27 for ; Thu, 22 Oct 2009 10:05:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=1dmmLUNucoZR3PwiOoqrqiGcrmlTuF3Ic/JZjerRtVo=; b=tWI501vxMdsZJP3G3ZcNUXSTVR+B/uA0qxmaEQ0u2ZtFK4I4F+k8EvUDC58xGovKxv xzwM0HAAgBOsnW7JRtuaEywNDRwZmv4CKY9wL6AmW6yNYE/oWM7LBeULpLUptC6bk8Yk OIDhi7FiEuhJg/mkkXRLqfbreKSZvG61ZjTNc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=dbeT1n75r1ftgC1DecE8nfWZvtmTFNC9Bn95VkKlRN3DsZIbJGtYzMNM8+t7+6/unK S33FEIZd0RMoGyTi03uczc/FBl+edjK8fYwzfXUqM+ceYifv46HCSMl3yx6dPUsWglQa 51N7U2ujqM23r8ZVmM5DZ7r0hgOSs6rYlS25s= MIME-Version: 1.0 Received: by 10.216.89.12 with SMTP id b12mr3433069wef.93.1256231151217; Thu, 22 Oct 2009 10:05:51 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Thu, 22 Oct 2009 12:05:31 -0500 Message-ID: Subject: Re: repeated timeouts on quorum reads To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable What is the CF definition for movie? the part from your config file like On Thu, Oct 22, 2009 at 12:04 PM, Jonathan Ellis wrote: > On Thu, Oct 22, 2009 at 10:59 AM, Edmond Lau wrote: >> Try: >> >> keyspace: "Analytics" >> key: "o:movie" >> column family: "movie" >> super column: "all" >> >> I was able to get timeouts with a few concurrent get_slice() calls >> (3-4 with debug logging, 6 without) on the super column, with start =3D >> "", finish =3D "", and count =3D 1. >> >> Edmond >> >> On Thu, Oct 22, 2009 at 7:42 AM, Jonathan Ellis wrot= e: >>> What is your columnfamily definition? =A0What query should I test with? >>> >>> On Wed, Oct 21, 2009 at 7:33 PM, Edmond Lau wrote: >>>> On Tue, Oct 20, 2009 at 2:03 PM, Jonathan Ellis wr= ote: >>>>> Okay, so we have 2 problems: >>>>> >>>>> =A0- the read is simply too slow >>>>> =A0- we're doing unnecessary(?) read repairs >>>>> >>>>> For the first I think it's going to be easiest if you can export the >>>>> row in question so we can do some profiling. =A0(Eric will have a too= l >>>>> to do this soon.) >>>> >>>> Ok - I've created a cassandra setup with just the row in question and >>>> tarred up the data + storage-conf.xml for the three nodes: >>>> >>>> http://demo.ooyala.com/cassandra/cassandra-1.tgz >>>> http://demo.ooyala.com/cassandra/cassandra-2.tgz >>>> http://demo.ooyala.com/cassandra/cassandra-3.tgz >>>> >>>>> >>>>> For the second, how sure are you that the repair is unnecessary? =A0D= oes >>>>> it repeat the repair when you do the query again after letting the >>>>> callbacks complete? >>>> >>>> I didn't do any writes as I was debugging the issue whereas I was >>>> reading the same data a whole bunch of times, so the repair should've >>>> been unnecessary. >>>> >>>> Thanks for your help, >>>> Edmond >>>> >>>>> >>>>> -Jonathan >>>>> >>>>> On Mon, Oct 19, 2009 at 11:38 PM, Edmond Lau wrot= e: >>>>>> A single local read with debug logging takes 3-4 secs on the node wi= th >>>>>> 3 data files. =A0It actually takes roughly as long on the other node= s as >>>>>> well (I was incorrect in saying that they were fast). >>>>>> >>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,953 CassandraServer.jav= a >>>>>> (line 219) get_slice >>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:06,955 StorageProxy.java >>>>>> (line 468) weakreadlocal reading >>>>>> SliceFromReadCommand(table=3D'Analytics', key=3D'o:movie', >>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'movie', >>>>>> superColumnName=3D'[B@26e00675', columnName=3D'null')', start=3D'', >>>>>> finish=3D'', reversed=3Dfalse, count=3D1) >>>>>> DEBUG [pool-1-thread-22] 2009-10-20 04:26:10,382 SliceQueryFilter.ja= va >>>>>> (line 115) collecting 1000036:false:515@1255978226411394 >>>>>> DEBUG [CONSISTENCY-MANAGER:2] 2009-10-20 04:26:10,383 >>>>>> ConsistencyManager.java (line 168) Reading consistency digest for >>>>>> o:movie from 2926@[172.16.129.71:7000, 172.16.129.72:7000] >>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:26:13,332 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 2926@172.16.129.71:7000 >>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,675 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 2926@172.16.129.72:7000 >>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:26:13,677 >>>>>> ConsistencyManager.java (line 96) Performing read repair for o:movie >>>>>> to 2937@[172.16.129.71:7000, 172.16.129.72:7000, 172.16.129.75:7000] >>>>>> >>>>>> Also, it's unclear why there's a read repair going on above, since I >>>>>> haven't made any updates. >>>>>> >>>>>> Doing a quorum read on the node with 3 data files takes 7 seconds w/ >>>>>> debug logging and leads to a DigestMismatchException, again even >>>>>> though I haven't made any updates. >>>>>> >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,892 CassandraServer.jav= a >>>>>> (line 219) get_slice >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java >>>>>> (line 374) strongread reading data for >>>>>> SliceFromReadCommand(table=3D'Analytics', key=3D'o:movie', >>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'movie', >>>>>> superColumnName=3D'[B@2cf50c5f', columnName=3D'null')', start=3D'', >>>>>> finish=3D'', reversed=3Dfalse, count=3D1) from 3675@172.16.129.75:70= 00 >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,894 StorageProxy.java >>>>>> (line 381) strongread reading digest for >>>>>> SliceFromReadCommand(table=3D'Analytics', key=3D'o:movie', >>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'movie', >>>>>> superColumnName=3D'[B@2cf50c5f', columnName=3D'null')', start=3D'', >>>>>> finish=3D'', reversed=3Dfalse, count=3D1) from 3676@172.16.129.71:70= 00 >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:00,895 StorageProxy.java >>>>>> (line 381) strongread reading digest for >>>>>> SliceFromReadCommand(table=3D'Analytics', key=3D'o:movie', >>>>>> column_parent=3D'QueryPath(columnFamilyName=3D'movie', >>>>>> superColumnName=3D'[B@2cf50c5f', columnName=3D'null')', start=3D'', >>>>>> finish=3D'', reversed=3Dfalse, count=3D1) from 3676@172.16.129.72:70= 00 >>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:03,994 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.71:7000 >>>>>> DEBUG [RESPONSE-STAGE:2] 2009-10-20 04:30:04,107 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.72:7000 >>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,787 SliceQueryFilter.ja= va >>>>>> (line 115) collecting 1000036:false:515@1255978226411394 >>>>>> DEBUG [ROW-READ-STAGE:9] 2009-10-20 04:30:04,788 ReadVerbHandler.jav= a >>>>>> (line 100) Read key o:movie; sending response to >>>>>> 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000 >>>>>> DEBUG [RESPONSE-STAGE:4] 2009-10-20 04:30:04,788 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 15445226-F6EF-C97B-F239-FC0316748C5E@172.16.129.75:7000 >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,789 >>>>>> ReadResponseResolver.java (line 84) Response deserialization time : = 0 >>>>>> ms. >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,790 >>>>>> ReadResponseResolver.java (line 84) Response deserialization time : = 0 >>>>>> ms. >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:04,791 >>>>>> ReadResponseResolver.java (line 84) Response deserialization time : = 1 >>>>>> ms. >>>>>> =A0INFO [pool-1-thread-25] 2009-10-20 04:30:04,792 StorageProxy.java >>>>>> (line 410) DigestMismatchException: o:movie >>>>>> DEBUG [RESPONSE-STAGE:1] 2009-10-20 04:30:07,953 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 3690@172.16.129.71:7000 >>>>>> DEBUG [RESPONSE-STAGE:3] 2009-10-20 04:30:08,130 >>>>>> ResponseVerbHandler.java (line 34) Processing response on a callback >>>>>> from 3690@172.16.129.72:7000 >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132 >>>>>> ReadResponseResolver.java (line 84) Response deserialization time : = 1 >>>>>> ms. >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132 >>>>>> ReadResponseResolver.java (line 84) Response deserialization time : = 0 >>>>>> ms. >>>>>> DEBUG [pool-1-thread-25] 2009-10-20 04:30:08,132 >>>>>> ReadResponseResolver.java (line 149) resolve: 1 ms. >>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 >>>>>> SliceQueryFilter.java (line 115) collecting >>>>>> 1000036:false:515@1255978226411394 >>>>>> DEBUG [ROW-READ-STAGE:10] 2009-10-20 04:30:08,235 ReadVerbHandler.ja= va >>>>>> (line 100) Read key o:movie; sending response to >>>>>> 3690@172.16.129.75:7000 >>>>>> >>>>>> Edmond >>>>>> >>>>>> On Mon, Oct 19, 2009 at 8:04 PM, Jonathan Ellis = wrote: >>>>>>> On Mon, Oct 19, 2009 at 9:48 PM, Edmond Lau wro= te: >>>>>>>> 8 data files total. =A03 nodes have 1, 1 has 2, the 3rd has 3. >>>>>>>> >>>>>>>>> Does it still take ~8s if you direct a CL.ONE query at one of the >>>>>>>>> nodes you know has the data (i.e., a local read)? >>>>>>>> >>>>>>>> Local reads return quickly, but if you look at the logs below, the= re's >>>>>>>> 6 secs from when cassandra prints that it's reading a consistency >>>>>>>> digest to when it processes a callback. >>>>>>> >>>>>>> That's from when it sends the digest command, to when it gets a cal= lback. >>>>>>> >>>>>>> Something doesn't add up. =A0If computing &sending a digest respons= e >>>>>>> takes 6s, either reading the row or computing the digest must be >>>>>>> taking substantial time. =A0But if local reads are fast, it's not >>>>>>> reading the row that is slow, and if resolve is fast (from your log= s, >>>>>>> it is 2ms) then it's not computing the digest that is slow. >>>>>>> >>>>>>> Try doing a local read from the node with 3 data files, see if that= is slower. >>>>>>> >>>>>>> If you still see a fast read there, add some more debug logging in >>>>>>> ReadVerbHandler.java, where it says something like (I'm going from >>>>>>> trunk here but 0.4.1 is similar) >>>>>>> >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0Row row =3D readCommand.getRow(table); >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0ReadResponse readResponse; >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0if (readCommand.isDigestQuery()) >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0{ >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0readResponse =3D new ReadResponse(ro= w.digest()); >>>>>>> >>>>>>> time the getRow and the row.digest() calls and log how long those t= ake >>>>>>> in the quorum scenario. >>>>>>> >>>>>>> -Jonathan >>>>>>> >>>>>> >>>>> >>>> >>> >> >