Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 20815 invoked from network); 14 Sep 2009 22:37:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Sep 2009 22:37:53 -0000 Received: (qmail 87698 invoked by uid 500); 14 Sep 2009 22:37:53 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 87678 invoked by uid 500); 14 Sep 2009 22:37:53 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 87669 invoked by uid 99); 14 Sep 2009 22:37:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Sep 2009 22:37:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simongsmith@gmail.com designates 209.85.211.198 as permitted sender) Received: from [209.85.211.198] (HELO mail-yw0-f198.google.com) (209.85.211.198) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Sep 2009 22:37:43 +0000 Received: by ywh36 with SMTP id 36so5329292ywh.21 for ; Mon, 14 Sep 2009 15:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=sLjaxYHluHFUf8QUjKvPv7AlEqsul0VtHcOrDlJuYdA=; b=BQNAs9lbHMJCDnYZqwN4qUTNHDceQGvrJ3GnvuoQk6vuk1BjjjZ9LxvqW5hfncHJdZ yg2DNi/vteCD8gW+dIAJnsW0+GtbsD3C3T+slSfy4vUGe5esaLsH4pjbqdeQ41h57o+3 JvRp3N/fW1IT5yiohL29MY0tnx3ueywR4tgDM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=irD6gxnTTZmysgCaQbhNTeFjAqM6ll928g9wo/HS5GPtEFUrJtzSegMksYr/a4Br2a p0+qBjXIng11ymsOjrYng4Um8zowAAedt+Ytbwtt9552s0fPrs3lwWdAKnO45vRzvbvi yu2oIxOYX72nsyXm2NZ89YiSKVENAZvuz5Bak= MIME-Version: 1.0 Received: by 10.150.1.21 with SMTP id 21mr11129522yba.171.1252967842904; Mon, 14 Sep 2009 15:37:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 14 Sep 2009 18:37:22 -0400 Message-ID: Subject: Re: get_key_range (CASSANDRA-169) From: Simon Smith To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Jonathan: I tried out the patch you attached to JIRA-440, I applied it to 0.4, and it works for me. Now, as soon as I take the node down, there may be one or two seconds of the thrift-internal error (timeout) but as soon as the host doing the querying can see the node is down, the error stops, and valid output is given by the get_key_range query again. And there isn't any disruption when the node comes back up. Thanks! (I put this same note in the bug report). Simon Smith On Fri, Sep 11, 2009 at 9:38 AM, Simon Smith wrote: > https://issues.apache.org/jira/browse/CASSANDRA-440 > > Thanks again, of course I'm happy to give any additional information > and will gladly do any testing of the fix. > > Simon > > > On Thu, Sep 10, 2009 at 7:32 PM, Jonathan Ellis wrote= : >> That confirms what I suspected, thanks. >> >> Can you file a ticket on Jira and I'll work on a fix for you to test? >> >> thanks, >> >> -Jonathan >> >> On Thu, Sep 10, 2009 at 4:42 PM, Simon Smith wrot= e: >>> I sent get_key_range to node #1 (174.143.182.178), and here are the >>> resulting log lines from 174.143.182.178's log (Do you want the other >>> nodes' log lines? Let me know if so.) >>> >>> DEBUG - get_key_range >>> DEBUG - reading RangeCommand(table=3D'users', columnFamily=3Dpwhash, >>> startWith=3D'', stopAt=3D'', maxResults=3D100) from 648@174.143.182.178= :7000 >>> DEBUG - collecting :false:32@1252535119 >>> =A0[ ... chop the repeated & identical collecting messages ... ] >>> DEBUG - collecting :false:32@1252535119 >>> DEBUG - Sending RangeReply(keys=3D[java, java1, java2, java3, java4, >>> java5, match, match1, match2, match3, match4, match5, newegg, newegg1, >>> newegg2, newegg3, newegg4, newegg5, now, now1, now2, now3, now4, now5, >>> sgs, sgs1, sgs2, sgs3, sgs4, sgs5, test, test1, test2, test3, test4, >>> test5, xmind, xmind1, xmind2, xmind3, xmind4, xmind5], >>> completed=3Dfalse) to 648@174.143.182.178:7000 >>> DEBUG - Processing response on an async result from 648@174.143.182.178= :7000 >>> DEBUG - reading RangeCommand(table=3D'users', columnFamily=3Dpwhash, >>> startWith=3D'', stopAt=3D'', maxResults=3D58) from 649@174.143.182.182:= 7000 >>> DEBUG - Processing response on an async result from 649@174.143.182.182= :7000 >>> DEBUG - reading RangeCommand(table=3D'users', columnFamily=3Dpwhash, >>> startWith=3D'', stopAt=3D'', maxResults=3D58) from 650@174.143.182.179:= 7000 >>> DEBUG - Processing response on an async result from 650@174.143.182.179= :7000 >>> DEBUG - reading RangeCommand(table=3D'users', columnFamily=3Dpwhash, >>> startWith=3D'', stopAt=3D'', maxResults=3D22) from 651@174.143.182.185:= 7000 >>> DEBUG - Processing response on an async result from 651@174.143.182.185= :7000 >>> DEBUG - Disseminating load info ... >>> >>> >>> Thanks, >>> >>> Simon >>> >>> On Thu, Sep 10, 2009 at 5:25 PM, Jonathan Ellis wro= te: >>>> I think I see the problem. >>>> >>>> Can you check if your range query is spanning multiple nodes in the >>>> cluster? =A0You can tell by setting the log level to DEBUG, and lookin= g >>>> for after it logs get_key_range, it will say "reading >>>> RangeCommand(...) from ... @machine" more than once. >>>> >>>> The bug is that when picking the node to start the range query it >>>> consults the failure detector to avoid dead nodes, but if the query >>>> spans nodes it does not do that on subsequent nodes. >>>> >>>> But if you are only generating one RangeCommand per get_key_range then >>>> we have two bugs. :) >>>> >>>> -Jonathan >>>> >>> >> >