From solr-user-return-140880-archive-asf-public=cust-asf.ponee.io@lucene.apache.org  Thu Apr 26 18:44:01 2018
Return-Path: <solr-user-return-140880-archive-asf-public=cust-asf.ponee.io@lucene.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id B4F0C180648
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 26 Apr 2018 18:44:00 +0200 (CEST)
Received: (qmail 15743 invoked by uid 500); 26 Apr 2018 16:43:58 -0000
Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:solr-user-help@lucene.apache.org>
List-Unsubscribe: <mailto:solr-user-unsubscribe@lucene.apache.org>
List-Post: <mailto:solr-user@lucene.apache.org>
List-Id: <solr-user.lucene.apache.org>
Reply-To: solr-user@lucene.apache.org
Delivered-To: mailing list solr-user@lucene.apache.org
Received: (qmail 15710 invoked by uid 99); 26 Apr 2018 16:43:58 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2018 16:43:58 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 81322C9E46
	for <solr-user@lucene.apache.org>; Thu, 26 Apr 2018 16:43:57 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.1
X-Spam-Level:
X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	MIME_QP_LONG_LINE=0.001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=openindex.io
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id 27DntrnMSmmq for <solr-user@lucene.apache.org>;
	Thu, 26 Apr 2018 16:43:56 +0000 (UTC)
Received: from mail1.ams.nl.openindex.io (mail1.ams.nl.openindex.io [141.105.125.41])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 792955F572
	for <solr-user@lucene.apache.org>; Thu, 26 Apr 2018 16:43:56 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by mail1.ams.nl.openindex.io (Postfix) with ESMTP id 483D938074F
	for <solr-user@lucene.apache.org>; Thu, 26 Apr 2018 16:43:50 +0000 (UTC)
Received: from mail1.ams.nl.openindex.io ([127.0.0.1])
	by localhost (mail1.ams.nl.openindex.io [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id O0g1E5haGJIp for <solr-user@lucene.apache.org>;
	Thu, 26 Apr 2018 16:43:50 +0000 (UTC)
Received: from mail1.ams.nl.openindex.io (localhost [127.0.0.1])
	by mail1.ams.nl.openindex.io (Postfix) with ESMTP id 2519E380609
	for <solr-user@lucene.apache.org>; Thu, 26 Apr 2018 16:43:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=openindex.io; s=mail;
	t=1524761030; bh=wcF8qjuqc0/h7S6HZSV9nMOVmy4T+yBMsXOBlVMAEYE=;
	h=Subject:From:To:Date:From;
	b=Y00jPNXTllNelp7EocYjThWox/G2YZ3YnID1W6zb++0hWaZCrAFIf1UBnfCQUrBnF
	 xFmsssDoNUTmbsQmpd07ZPgF9QRPfNqg2XIg2ofxntS1AswCtzL/NGmguiCQbKlrLl
	 cUIDx+SoFE6HmzYjH6rTaRacZ8wUe8H+AgYTpZ+fZVtEIaqeQlcEeaB8uraiGebLBm
	 Y8hRo+ZMlpr5nBBCuY0Y871QP8X1A2jItWLmvY1hxj7avxxHfru2KKOmc36eMUMrzn
	 tsLlq4Qdn2WyW9zqeKjd9IwHHM6q7AgSivh7Z1rIEkd/iEtbF7MErDpGEEUgzvg9AI
	 toB6+xxGE8xbg==
Subject: 7.3 appears to leak
From: =?utf-8?Q?Markus_Jelsma?= <markus.jelsma@openindex.io>
To: =?utf-8?Q?Solr-user?= <solr-user@lucene.apache.org>
Date: Thu, 26 Apr 2018 16:43:50 +0000
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Priority: 3 (Normal)
X-Mailer: Zarafa 7.2.1-51838
X-Original-To: 
Message-Id: <zarafa.5ae201c6.2f85.218a781d795b07b1@mail1.ams.nl.openindex.io>

Hello,

We just finished upgrading our three separate clusters from 7.2.1 to 7.3, which went fine, except for our main text search collection, it appears to leak memory on commit!

After initial upgrade we saw the cluster slowly starting to run out of memory within about an hour and a half. We increased heap in case 7.3 just requires more of it, but the heap consumption graph is still growing on each commit. Heap space cannot be reclaimed by forcing the garbage collector to run, everything just piles up in the OldGen. Running with this slightly larger heap, the first nodes will run out of memory in about two and a half hours after cluster restart.

The heap eating cluster is a 2shard/3replica system on separate nodes. Each replica is about 50 GB in size and about 8.5 million documents. On 7.2.1 it ran fine with just a 2 GB heap. With 7.3 and 2.5 GB heap, it will take just a little longer for it to run out of memory.

I inspected reports shown by the sampler of VisualVM and spotted one peculiarity, the number of instances of SortedIntDocSet kept growing on each commit by about the same amount as the number of cached filter queries. But this doesn't happen on the logs cluster, SortedIntDocSet instances are neatly collected there. The number of instances also accounts for the number of commits since start up times the cache sizes

Our other two clusters don't have this problem, one of them receives very few commits per day, but the other receives data all the time, it logs user interactions so a large amount of data is coming in all the time. I cannot reproduce it locally by indexing data and committing all the time, the peak usage in OldGen stays about the same. But, i can reproduce it locally when i introduce queries, and filter queries while indexing pieces of data and committing it.

So, what is the problem=3F I dug in the CHANGES.txt of both Lucene and Solr, but nothing really caught my attention. Does anyone here have an idea where to look=3F

Many thanks,
Markus