Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1C5DD20049C for ; Fri, 11 Aug 2017 17:57:02 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1AB4616D1DE; Fri, 11 Aug 2017 15:57:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 398A016D1D1 for ; Fri, 11 Aug 2017 17:57:01 +0200 (CEST) Received: (qmail 54403 invoked by uid 500); 11 Aug 2017 15:56:58 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 54357 invoked by uid 99); 11 Aug 2017 15:56:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Aug 2017 15:56:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9275CC023E for ; Fri, 11 Aug 2017 15:56:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.979 X-Spam-Level: ** X-Spam-Status: No, score=2.979 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=instaclustr-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id qlTxbcI02lrw for ; Fri, 11 Aug 2017 15:56:55 +0000 (UTC) Received: from mail-yw0-f178.google.com (mail-yw0-f178.google.com [209.85.161.178]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 7DDF85FD47 for ; Fri, 11 Aug 2017 15:56:55 +0000 (UTC) Received: by mail-yw0-f178.google.com with SMTP id s143so24802619ywg.1 for ; Fri, 11 Aug 2017 08:56:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instaclustr-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=PbJrlUgIA2RGUMQH6oXBpHhluw4WkAO4U2iGFqhM9kA=; b=cK6WverX762Y+ub5t2IJGGOKUVav+JjfR00smJ9bqoaz0fC8SG3XtuIP6Efr3FSECX SlWfDCJEA/0+Yx3Ne41PrhhZJlCAvMqJpgClpo5Joq874jpXAXhkJiFoPydUfkeG7BWi P6VNiTXeLkbNlxtOrbZPz7d9HSnqm9W7S1+MyaQB3dauDtvnIa4V1ATVpPEwT+FGqOQR 1r79L4UOm53H3+RU3VUXGXdtQu+P1dL6KIpwT8JxWVX4yF0ui8K43jtBC2mrWDlHzOLf 3d/HjCP4HNPacmgPV6QybGXi42RVRjuHV5djT5aEQyAPUOQa1y5BlymT6FlFCWPfvWrQ nejg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=PbJrlUgIA2RGUMQH6oXBpHhluw4WkAO4U2iGFqhM9kA=; b=ZtCW5NHXebNbeBGXFqZaVO5aEXyX4cobsduN5VWQGrC8vrmWWb9xacJZJuMLkd84DJ hxhzCkAQPdV32KbFenhReRu1Ho5L+mi1ELXvvhVZgenI0csIRPzGKwlY3UF6ulsml4ty RxmOzBqZEeQH5t/jDS9uYoniNWTnd8BWCrL7P+V8iEbgyeDd2GAB+Qk5rG9k3o+QcUca MJZy0Y6fPetO9ESdZKiYKUyhTAsKa9ds48u/GW580MaRcgh+RiQYdtpCPCScHPL7xJtM aUfS2A4rxXOSl3zgIbCvN52/hJYzlB+p5A36HRbRjlJaprVGqrBVfPCDpBEHROsQK3mC HbpQ== X-Gm-Message-State: AHYfb5h15g9HASclNRbSRFOZLOpJ7cVwOWcIWL2mV59dKpRX12rd+4ST ChiZ7XNcXF9VyU7a7PnqlIfQhixho3HTMn8= X-Received: by 10.13.227.193 with SMTP id m184mr10211571ywe.212.1502467008783; Fri, 11 Aug 2017 08:56:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.38.5 with HTTP; Fri, 11 Aug 2017 08:56:48 -0700 (PDT) Received: by 10.129.38.5 with HTTP; Fri, 11 Aug 2017 08:56:48 -0700 (PDT) In-Reply-To: References: <9db5bcab-982a-9ef2-4abe-c18a5ef04497@fantasymail.de> <1b9608be-b703-67ef-b890-6b6e3aef620f@fantasymail.de> From: kurt greaves Date: Fri, 11 Aug 2017 15:56:48 +0000 Message-ID: Subject: Re: rebuild constantly fails, 3.11 To: dev@cassandra.apache.org, Termite Viewer , User Content-Type: multipart/alternative; boundary="94eb2c186e3e58cd0105567c5ce5" archived-at: Fri, 11 Aug 2017 15:57:02 -0000 --94eb2c186e3e58cd0105567c5ce5 Content-Type: text/plain; charset="UTF-8" cc'ing user back in... On 12 Aug. 2017 01:55, "kurt greaves" wrote: > How much memory do these machines have? Typically we've found that G1 > isn't worth it until you get to around 24G heaps, and even at that it's not > really better than CMS. You could try CMS with an 8G heap and 2G new size. > > However as the oom is only happening on one node have you ensured there > are no extra processes running on that node that could be consuming extra > memory? Note that the oom killer will kill the process with the highest oom > score, which generally corresponds to the process using the most memory, > but not necessarily the problem. > > Also could you run nodetool info on the problem node and 1 other and dump > the output in a gist? It would be interesting to see if there is a > significant difference in off-heap. > > On 11 Aug. 2017 17:30, "Micha" wrote: > >> It's an oom issue, the kernel kills the cassandra job. >> The config was to use offheap buffers and 20G java heap, I changed this >> to use heap buffers and 16G java heap. I added a new node yesterday >> which got streams from 4 other nodes. They all succeeded except on the >> one node which failed before. This time again the db was killed by the >> kernel. At the moment I don't know what is the reason here, since the >> nodes are equal. >> >> For me it seems the g1gc is not able to free the memory fast enough. >> The settings were for MaxGCPauseMillis=600 and ParallelGCThreads=10 >> ConcGCThreads=10 which maybe are too high since the node has only 8 >> cores.. >> I changed this ParallelGCThreads=8 and ConcGCThreads=2 as is mentioned >> in the comments of jvm.options >> >> Since the bootstrap of the fifth node did not complete I will start it >> again and check if the memory is still decreasing over time. >> >> >> >> Michael >> >> >> >> On 11.08.2017 01:25, Jeff Jirsa wrote: >> > >> > >> > On 2017-08-08 01:00 (-0700), Micha wrote: >> >> Hi, >> >> >> >> it seems I'm not able to add add 3 node dc to a 3 node dc. After >> >> starting the rebuild on a new node, nodetool netstats show it will >> >> receive 1200 files from node-1 and 5000 from node-2. The stream from >> >> node-1 completes but the stream from node-2 allways fails, after >> sending >> >> ca 4000 files. >> >> >> >> After restarting the rebuild it again starts to send the 5000 files. >> >> The whole cluster is connected via one switch only , no firewall >> >> between, the networks shows no errors. >> >> The machines have 8 cores, 32GB RAM and two 1TB discs as raid0. >> >> the logs show no errors. The size of the data is ca 1TB. >> > >> > Is there anything in `dmesg` ? System logs? Nothing? Is node2 running? >> Is node3 running? >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org >> > For additional commands, e-mail: dev-help@cassandra.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org >> For additional commands, e-mail: dev-help@cassandra.apache.org >> >> --94eb2c186e3e58cd0105567c5ce5--