cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kurt greaves <k...@instaclustr.com>
Subject Re: rebuild constantly fails, 3.11
Date Fri, 11 Aug 2017 15:56:48 GMT
cc'ing user back in...

On 12 Aug. 2017 01:55, "kurt greaves" <kurt@instaclustr.com> wrote:

> How much memory do these machines have?  Typically we've found that G1
> isn't worth it until you get to around 24G heaps, and even at that it's not
> really better than CMS. You could try CMS with an 8G heap and 2G new size.
>
> However as the oom is only happening on one node have you ensured there
> are no extra processes running on that node that could be consuming extra
> memory? Note that the oom killer will kill the process with the highest oom
> score, which generally corresponds to the process using the most memory,
> but not necessarily the problem.
>
> Also could you run nodetool info on the problem node and 1 other and dump
> the output in a gist? It would be interesting to see if there is a
> significant difference in off-heap.
>
> On 11 Aug. 2017 17:30, "Micha" <micha-1@fantasymail.de> wrote:
>
>> It's an oom issue, the kernel kills the cassandra job.
>> The config was to use offheap buffers and 20G java heap, I changed this
>> to use heap buffers and 16G java heap. I added a  new node yesterday
>> which got streams from 4 other nodes. They all succeeded except on the
>> one node which failed before. This time again the db was killed by the
>> kernel. At the moment I don't know what is the reason here, since the
>> nodes are equal.
>>
>> For me it seems the g1gc is not able to free the memory fast enough.
>> The settings were for  MaxGCPauseMillis=600 and ParallelGCThreads=10
>> ConcGCThreads=10 which maybe are too high since the node has only 8
>> cores..
>> I changed this ParallelGCThreads=8 and ConcGCThreads=2 as is mentioned
>> in the comments of jvm.options
>>
>> Since the bootstrap of the fifth node did not complete I will start it
>> again and check if the memory is still decreasing over time.
>>
>>
>>
>>  Michael
>>
>>
>>
>> On 11.08.2017 01:25, Jeff Jirsa wrote:
>> >
>> >
>> > On 2017-08-08 01:00 (-0700), Micha <micha-1@fantasymail.de> wrote:
>> >> Hi,
>> >>
>> >> it seems I'm not able to add add 3 node dc to a 3 node dc. After
>> >> starting the rebuild on a new node, nodetool netstats show it will
>> >> receive 1200 files from node-1 and 5000 from node-2. The stream from
>> >> node-1 completes but the stream from node-2 allways fails, after
>> sending
>> >> ca 4000 files.
>> >>
>> >> After restarting the rebuild it again starts to send the 5000 files.
>> >> The whole cluster is connected via one switch only , no firewall
>> >> between, the networks shows no errors.
>> >> The machines have 8 cores, 32GB RAM and two 1TB discs as raid0.
>> >> the logs show no errors. The size of the data is ca 1TB.
>> >
>> > Is there anything in `dmesg` ?  System logs? Nothing? Is node2 running?
>> Is node3 running?
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message