cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Smith <simongsm...@gmail.com>
Subject Re: get_key_range (CASSANDRA-169)
Date Mon, 14 Sep 2009 22:37:22 GMT
Jonathan:

I tried out the patch you attached to JIRA-440, I applied it to 0.4,
and it works for me.  Now, as soon as I take the node down, there may
be one or two seconds of the thrift-internal error (timeout) but as
soon as the host doing the querying can see the node is down, the
error stops, and valid output is given by the get_key_range query
again.  And there isn't any disruption when the node comes back up.

Thanks!  (I put this same note in the bug report).

Simon Smith




On Fri, Sep 11, 2009 at 9:38 AM, Simon Smith <simongsmith@gmail.com> wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-440
>
> Thanks again, of course I'm happy to give any additional information
> and will gladly do any testing of the fix.
>
> Simon
>
>
> On Thu, Sep 10, 2009 at 7:32 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> That confirms what I suspected, thanks.
>>
>> Can you file a ticket on Jira and I'll work on a fix for you to test?
>>
>> thanks,
>>
>> -Jonathan
>>
>> On Thu, Sep 10, 2009 at 4:42 PM, Simon Smith<simongsmith@gmail.com> wrote:
>>> I sent get_key_range to node #1 (174.143.182.178), and here are the
>>> resulting log lines from 174.143.182.178's log (Do you want the other
>>> nodes' log lines? Let me know if so.)
>>>
>>> DEBUG - get_key_range
>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>> startWith='', stopAt='', maxResults=100) from 648@174.143.182.178:7000
>>> DEBUG - collecting :false:32@1252535119
>>>  [ ... chop the repeated & identical collecting messages ... ]
>>> DEBUG - collecting :false:32@1252535119
>>> DEBUG - Sending RangeReply(keys=[java, java1, java2, java3, java4,
>>> java5, match, match1, match2, match3, match4, match5, newegg, newegg1,
>>> newegg2, newegg3, newegg4, newegg5, now, now1, now2, now3, now4, now5,
>>> sgs, sgs1, sgs2, sgs3, sgs4, sgs5, test, test1, test2, test3, test4,
>>> test5, xmind, xmind1, xmind2, xmind3, xmind4, xmind5],
>>> completed=false) to 648@174.143.182.178:7000
>>> DEBUG - Processing response on an async result from 648@174.143.182.178:7000
>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>> startWith='', stopAt='', maxResults=58) from 649@174.143.182.182:7000
>>> DEBUG - Processing response on an async result from 649@174.143.182.182:7000
>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>> startWith='', stopAt='', maxResults=58) from 650@174.143.182.179:7000
>>> DEBUG - Processing response on an async result from 650@174.143.182.179:7000
>>> DEBUG - reading RangeCommand(table='users', columnFamily=pwhash,
>>> startWith='', stopAt='', maxResults=22) from 651@174.143.182.185:7000
>>> DEBUG - Processing response on an async result from 651@174.143.182.185:7000
>>> DEBUG - Disseminating load info ...
>>>
>>>
>>> Thanks,
>>>
>>> Simon
>>>
>>> On Thu, Sep 10, 2009 at 5:25 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>> I think I see the problem.
>>>>
>>>> Can you check if your range query is spanning multiple nodes in the
>>>> cluster?  You can tell by setting the log level to DEBUG, and looking
>>>> for after it logs get_key_range, it will say "reading
>>>> RangeCommand(...) from ... @machine" more than once.
>>>>
>>>> The bug is that when picking the node to start the range query it
>>>> consults the failure detector to avoid dead nodes, but if the query
>>>> spans nodes it does not do that on subsequent nodes.
>>>>
>>>> But if you are only generating one RangeCommand per get_key_range then
>>>> we have two bugs. :)
>>>>
>>>> -Jonathan
>>>>
>>>
>>
>

Mime
View raw message