Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 83D19200C67 for ; Mon, 1 May 2017 02:28:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 82585160BA9; Mon, 1 May 2017 00:28:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7B4ED160BA4 for ; Mon, 1 May 2017 02:28:21 +0200 (CEST) Received: (qmail 96568 invoked by uid 500); 1 May 2017 00:28:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96558 invoked by uid 99); 1 May 2017 00:28:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 May 2017 00:28:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6DDAB1809E2 for ; Mon, 1 May 2017 00:28:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.579 X-Spam-Level: *** X-Spam-Status: No, score=3.579 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LINEPADDING=1.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 1bkCfn9XIuJQ for ; Mon, 1 May 2017 00:28:16 +0000 (UTC) Received: from mail-vk0-f45.google.com (mail-vk0-f45.google.com [209.85.213.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E4E805F3BB for ; Mon, 1 May 2017 00:28:15 +0000 (UTC) Received: by mail-vk0-f45.google.com with SMTP id i65so14727384vkh.0 for ; Sun, 30 Apr 2017 17:28:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=RXZAbBbqELxYkXtRj9+a0sjIqE4bAzs0kwVpiKl+e5I=; b=RMfe8/qutJm7h696U9uJrF6LppgYIEmeToxAxnlMfnUrNPW2+VRj39OQJg5gsmWnbM OFYUNFWBhkSsdMJi7Ko7djeKH9E7M8HHI1HbT4mGli6zmvDCyjGUy5MSHcWtAdu6BcBa sZHuHtVNdpMDZ9/qxDNUuHoBZjV1chnfBwdzjqhbyvOYBB99XW+9bnwugk5teW5M7+6i 0Y49E9lXUKxuUorUTv591xolDXDVLL4BNnYqiJpeKumdwV6MqNU3j/rmixwKG93i/D4c Lz+Lwhxr62PGJTKxO2sve7Y+6HAHIRosf0HOl+MpbcFX6LIofOlh4ATo+l3IEef2IthM YTMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=RXZAbBbqELxYkXtRj9+a0sjIqE4bAzs0kwVpiKl+e5I=; b=DS9SbmXpX+f1gO1Z4kDg1en91HYGZqoORiR8MjZheaBfkE48JbefSARIvBTa0gkIvu upN+WvahJpTdbxzQ7HW82Td8JS98inh7c2PGiGy57TLimf6i5CdMaexMv6VQ7/RXz6rD xmZ6+2G4An0J+D/s3omDbw3M/K+3tofbKsmjlP7xCCPtPXu5gn2/DLjtg1f37Uo5LASr hAwrJGgOo8BtkckOb0c0CEgYGjDoq95uwlEFN4rHYx2BGyPBpIsSPAyJG8A89NvjZGYv M6JuTcvIj1/5Htb1HjyNiBzwBwUKVprxu1Izk0bML1sb3DEklasxVntBWz9Hc1S/THef SOWQ== X-Gm-Message-State: AN3rC/5lIfQZBylgbNuntaG4547m5RkkdpQOt/we0VtKKsxVEybsR5Sl Z2+cOr0UcD9Bm+mhp678tTvzeUA+J4aM X-Received: by 10.31.248.69 with SMTP id w66mr9340391vkh.70.1493598489105; Sun, 30 Apr 2017 17:28:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.94.102 with HTTP; Sun, 30 Apr 2017 17:27:28 -0700 (PDT) In-Reply-To: References: From: Anthony Grasso Date: Mon, 1 May 2017 10:27:28 +1000 Message-ID: Subject: Re: Very slow cluster To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=94eb2c14bd966178e3054e6b7fd1 archived-at: Mon, 01 May 2017 00:28:22 -0000 --94eb2c14bd966178e3054e6b7fd1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Eduardo, Please see my comment inline below regarding your third question. Regards, Anthony On 28 April 2017 at 21:26, Eduardo Alonso wrote= : > Hi to all: > > I am having some problems with two client's cassandra:3.0.8 clusters i > want to share with you. These clusters are for QA and DEV. > > The cluster 1 (1 DC) is composed by 3 vm (heap=3D4G, RAM=3D8G) sharing th= e > same physical machine and sharing one ssd. I know this is not the best > environment but it is only for testing purposes. > > The entire cluster runs very slow and sometimes have some failing inserts > causing saving hints and replaying them and some data inconsistency with = 2i > queries. > > I know it is not the best environment (virtual machines sharing physical > machine and one physical disk) but it is very weird to me that just the > same test case works like a charm in a 3 docker container inside my > laptop(i7 16G ssd) but causes a lot of problems in their cluster. > > *listen_address* and *rpc_address* are set to external domain name (i. e: > NODE_NAME.clientdomain.com). I have activated TRACE logs and get some > strange messages > > So, my questions: > > *1.- It is posible that one node(with ) send a message to self triggering > READ_REPAIR?* > > TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558 > MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:READ_REPAI= R going > over MessagingService > > TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513 > MessagingService.java:747 -01a.clientdomain.com/10.63.24.238 > sending > READ_REPAIR to 3426@/10.63.24.238" > > *Does this log line shows one node asking itself for a portion of data > that it has not? * > > *2.-* I have another suspicious log line about slow vms: > > -WARN [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287 - > Not marking nodes down due to local pause of 11195193520 > 5000000000 > > *Does this line says that there is a pause in JVM of 11 secs*? There is > no garbage collector log lines. *Is it posible that this 11 secs pause is > caused by a dns lookup of the domain?* > > > *3.-* I know that listen_address must be the external IP (Inter node > communications will be faster, no need to dns lookup) > > *If i set listen_address to external ip, is it necessary that ip be > pingable from all the other datacenter nodes? * > *Does inter-data-center communications use 'rpc_address' or > 'listen_address'*? > > All nodes in the cluster should be configured so that they can contact each other. As far as being able to ping each other, enabling ICMP can be useful for debugging inter communication problems. Regarding internode communication; the *listen_address* is used for internode communication in the cluster. Note that if you don't want to manually specify an IP to *listen_address* for each node in your cluster, leave it blank and Cassandra will use *InetAddress.getLocalHost()* to pick an address. > Thank you in advance > > > > > > > > > > > > > > > > > > > > Eduardo Alonso > V=C3=ADa de las dos Castillas, 33, =C3=81tica 4, 3=C2=AA Planta > 28224 Pozuelo de Alarc=C3=B3n, Madrid > Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // *@s= tratiobd > * > --94eb2c14bd966178e3054e6b7fd1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Eduardo,

Please see my comment inlin= e below regarding your third question.

Regards,
Anthony

On 28 April 2017 at 21:26, Eduardo Alonso <eduardoalonso@strat= io.com> wrote:
Hi to all:

I am having some prob= lems with two client's cassandra:3.0.8 clusters i want to share with yo= u. These clusters are for QA and DEV.

The cluster = 1 (1 DC) is composed by 3 vm (heap=3D4G, RAM=3D8G) sharing the same physica= l machine and sharing one ssd. I know this is not the best environment but = it is only for testing purposes.=C2=A0

The entire = cluster runs very slow and sometimes have some failing inserts causing savi= ng hints and replaying them and some data inconsistency with 2i queries.

I know it is not the best environment (virtual m= achines sharing physical machine and one physical disk) but it is very weir= d to me that just the same test case works like a charm in a 3 docker conta= iner inside my laptop(i7 16G ssd) but causes a lot of problems in their clu= ster.=C2=A0

listen_address and rpc_addre= ss are set to external domain name (i. e:=C2=A0NO= DE_NAME.clientdomain.com). I have activated TRACE logs and = get some strange messages

So, my questions:
<= div>
1.- It is posible that one node(with ) send a message= to self triggering READ_REPAIR?

TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558 M= essagingService.java:750 -=C2=A0Message-to-self TYPE:MUTATION VERB:READ_REPAIR=C2=A0go= ing over=C2=A0MessagingService

=C2= =A0 =C2=A0 TRACE [SharedPool-Worker-1] 201= 7-04-16 04:38:47,513 MessagingService.java:747 -01a.clientdomain.com/10.63.24.238=C2=A0sending=C2=A0READ_REPAIR<= span style=3D"font-size:12.8px">=C2=A0to 3426@/10.63.24.238"

Does= this log line shows one node asking itself for a portion of data that it h= as not?=C2=A0

2.- I have another s= uspicious log line about slow vms:=C2=A0

-WARN =C2=A0[GossipTasks:1] 2017-04-14 = 00:32:44,371 FailureDetector.java:287 - Not marking nodes down due to local= pause of 11195193520 > 5000000000

Does= this line says that there is a pause in JVM =C2=A0of 11 secs? There is= no garbage collector log lines. Is it posible that this 11 secs pause i= s caused by a dns lookup of the domain?

<= br>
3.- I know tha= t listen_address must be the external IP (Inter node communications will be= faster, no need to dns lookup)=C2=A0

If i= set listen_address to external ip, is it necessary=C2=A0that ip be pingabl= e from all the other datacenter nodes?=C2=A0
Does inter-data-center communications use 'rpc_ad= dress' or 'listen_address'= ?


All nodes in the cluster should be con= figured so that they can contact each other. As far as being able to ping e= ach other, enabling ICMP can be useful for debugging inter communication pr= oblems.

Regarding internode communication; the = listen_address is used for i= nternode communication in the cluster. Note that if you don't want to m= anually specify an IP to=C2=A0listen= _address for each node in your cluster, leave it blank and Cassa= ndra will use InetAddress.getLocalHo= st() to pick an address.
=C2=A0
Thank you i= n advance


=







=





=





Eduardo Alonso
<= div style=3D"font-size:12.8px">V=C3=ADa de las dos Castillas, 3= 3, =C3=81tica 4, 3=C2=AA Planta
28224 Pozuelo de Alarc=C3=B3n, Madrid

--94eb2c14bd966178e3054e6b7fd1--