cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Alonso <eduardoalo...@stratio.com>
Subject Re: Very slow cluster
Date Fri, 05 May 2017 08:17:34 GMT
Thank you Anthony.

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

2017-05-01 2:27 GMT+02:00 Anthony Grasso <anthony.grasso@gmail.com>:

> Hi Eduardo,
>
> Please see my comment inline below regarding your third question.
>
> Regards,
> Anthony
>
> On 28 April 2017 at 21:26, Eduardo Alonso <eduardoalonso@stratio.com>
> wrote:
>
>> Hi to all:
>>
>> I am having some problems with two client's cassandra:3.0.8 clusters i
>> want to share with you. These clusters are for QA and DEV.
>>
>> The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the
>> same physical machine and sharing one ssd. I know this is not the best
>> environment but it is only for testing purposes.
>>
>> The entire cluster runs very slow and sometimes have some failing inserts
>> causing saving hints and replaying them and some data inconsistency with 2i
>> queries.
>>
>> I know it is not the best environment (virtual machines sharing physical
>> machine and one physical disk) but it is very weird to me that just the
>> same test case works like a charm in a 3 docker container inside my
>> laptop(i7 16G ssd) but causes a lot of problems in their cluster.
>>
>> *listen_address* and *rpc_address* are set to external domain name (i.
>> e: NODE_NAME.clientdomain.com). I have activated TRACE logs and get some
>> strange messages
>>
>> So, my questions:
>>
>> *1.- It is posible that one node(with ) send a message to self triggering
>> READ_REPAIR?*
>>
>> TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558
>> MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:
>> READ_REPAIR going over MessagingService
>>
>>     TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513
>> MessagingService.java:747 -01a.clientdomain.com/10.63.24.238
>> <http://qathcsdvm01c.ny3.corp.portware.net/10.63.24.238> sending
>> READ_REPAIR to 3426@/10.63.24.238"
>>
>> *Does this log line shows one node asking itself for a portion of data
>> that it has not? *
>>
>> *2.-* I have another suspicious log line about slow vms:
>>
>> -WARN  [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287
>> - Not marking nodes down due to local pause of 11195193520 > 5000000000
>>
>> *Does this line says that there is a pause in JVM  of 11 secs*? There is
>> no garbage collector log lines. *Is it posible that this 11 secs pause
>> is caused by a dns lookup of the domain?*
>>
>>
>> *3.-* I know that listen_address must be the external IP (Inter node
>> communications will be faster, no need to dns lookup)
>>
>> *If i set listen_address to external ip, is it necessary that ip be
>> pingable from all the other datacenter nodes? *
>> *Does inter-data-center communications use 'rpc_address' or
>> 'listen_address'*?
>>
>>
> All nodes in the cluster should be configured so that they can contact
> each other. As far as being able to ping each other, enabling ICMP can be
> useful for debugging inter communication problems.
>
> Regarding internode communication; the *listen_address* is used for
> internode communication in the cluster. Note that if you don't want to
> manually specify an IP to *listen_address* for each node in your cluster,
> leave it blank and Cassandra will use *InetAddress.getLocalHost()* to
> pick an address.
>
>
>> Thank you in advance
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // *@stratiobd
>> <https://twitter.com/StratioBD>*
>>
>
>

Mime
View raw message