cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Rolo <r...@pythian.com>
Subject Re: nodetool status shows large numbers of up nodes are down
Date Tue, 10 Feb 2015 22:53:12 GMT
Can you run nodetool tpstats and check if there is pending requests on
GossipStage.
The timeout should not affect gossip (AFAIK).
As for problems you can have with this state is, if your nodes are marked
down for long and if you are using hinted handoff, your hints may not be
delivered and your data can be out of sync (can be fixed by increasing the
timeout limit or during repairs).

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Tel: 1649
www.pythian.com

On Tue, Feb 10, 2015 at 8:51 PM, Chris Lohfink <clohfink85@gmail.com> wrote:

> Are you hitting long GCs on your nodes? Can check gc log or look at
> cassandra log for GCInspector.
>
> Chris
>
> On Tue, Feb 10, 2015 at 1:28 PM, Cheng Ren <cheng.ren@bloomreach.com>
> wrote:
>
>> Hi Carlos,
>> Thanks for your suggestion. We did check the NTP setting and clock, and
>> they are all working normally. Schema versions are also consistent with
>> peers'.
>> BTW, the only change we made was to set some of nodes' request
>> timeout(read_request_timeout, write_request_timeout, range_request_timeout
>> and request_timeout) from 30000 to 10000 for 6 nodes yesterday. Will this
>> affect internode gossip?
>>
>> Thanks,
>> Cheng
>>
>> On Mon, Feb 9, 2015 at 11:07 PM, Carlos Rolo <rolo@pythian.com> wrote:
>>
>>> Hi Cheng,
>>>
>>> Are all machines configured with NTP and all clocks in sync? If that is
>>> not the case do it.
>>>
>>> If your clocks are not in sync it causes some weird issues like the ones
>>> you see, but also schema disagreements and in some cases corrupted data.
>>>
>>> Regards,
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Tel: 1649
>>> www.pythian.com
>>>
>>> On Tue, Feb 10, 2015 at 3:40 AM, Cheng Ren <cheng.ren@bloomreach.com>
>>> wrote:
>>>
>>>> Hi,
>>>> We have a two-dc cluster with 21 nodes and 27 nodes in each DC. Over
>>>> the past few months, we have seen nodetool status marks 4-8 nodes down
>>>> while they are actually functioning. Particularly today we noticed that
>>>> running nodetool status on some nodes shows higher number of nodes are down
>>>> than before while they are actually up and serving requests.
>>>> For example, on one node it shows 42 nodes are down.
>>>>
>>>> phi_convict_threshold of all nodes are set as 12, and we are running
>>>> cassandra 2.0.4 on AWS EC2 machines.
>>>>
>>>> Does anyone have recommendation on identifying the root cause of this?
>>>> Will this cause any consequences?
>>>>
>>>> Thanks,
>>>> Cheng
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>

-- 


--




Mime
View raw message