From solr-user-return-143467-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Sep 3 21:54:24 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1A0C6180647 for ; Mon, 3 Sep 2018 21:54:23 +0200 (CEST) Received: (qmail 87371 invoked by uid 500); 3 Sep 2018 19:54:22 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87359 invoked by uid 99); 3 Sep 2018 19:54:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2018 19:54:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 05DA5180102 for ; Mon, 3 Sep 2018 19:54:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.713 X-Spam-Level: * X-Spam-Status: No, score=1.713 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_NUMSUBJECT=0.5, MIME_QP_LONG_LINE=0.001, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=openindex.io Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id l_4h_dsM-YQ9 for ; Mon, 3 Sep 2018 19:54:19 +0000 (UTC) Received: from mail1.ams.nl.openindex.io (mail1.ams.nl.openindex.io [141.105.125.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 63D855F563 for ; Mon, 3 Sep 2018 19:54:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail1.ams.nl.openindex.io (Postfix) with ESMTP id D78EC380C7D for ; Mon, 3 Sep 2018 19:54:12 +0000 (UTC) Received: from mail1.ams.nl.openindex.io ([127.0.0.1]) by localhost (mail1.ams.nl.openindex.io [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MiQ2rrS4lpfU for ; Mon, 3 Sep 2018 19:54:12 +0000 (UTC) Received: from mail1.ams.nl.openindex.io (localhost [127.0.0.1]) by mail1.ams.nl.openindex.io (Postfix) with ESMTP id B6A2238073D for ; Mon, 3 Sep 2018 19:54:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=openindex.io; s=mail; t=1536004452; bh=xXkKrsIfaoFduKZzy+5C48vhxkKWNm9mmSLb2WHo2mU=; h=Subject:From:To:Date:From; b=b4NgNHA72v9sGEuQy/LgjS8e8B8mhJhLj+wttK5AtD+lNmvkLuNedzQJoZchhdswq VhweAXXPb0kX7I0LuNXn5cms8ELPMeW+AlriwXd07R/dwTqY6Ij1wOLBhghQjgVmSS 20hmoccJYmFdNda9fxDo36/cDQB93SWGiVhVZCXg3UN1uQOMy3pc+gLdXrzs2LY18I uXtsnk2LQ3owFKFWiMfDancAhfNxQ9BYr7yT0unMq/Q5Pfpr18cAXVkPklWvhpnNZQ kH/Ld5AV6dxn6uaQn+qNxlgC56qAEalWHQBOhwpGgKTUMN05Zcs2fAnxm716+90FJ7 h61TBoBFUkseQ== Subject: RE: Heap Memory Problem after Upgrading to 7.4.0 From: =?utf-8?Q?Markus_Jelsma?= To: =?utf-8?Q?solr-user=40lucene=2Eapache=2Eorg?= Date: Mon, 3 Sep 2018 19:54:12 +0000 Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-Mailer: Zarafa 7.2.1-51838 X-Original-To: Message-Id: Hello, Getting an OOM plus the fact you are having a lot of IndexSearcher instances rings a familiar bell. One of our collections has the same issue [1] when we attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr code but had to keep our Lucene filters in the schema, the problem persisted. The odd thing, however, is that you appear to have the same problem, but not with 7.3.0=3F Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm the problem is not also in 7.3.0=3F=20 You should see the instance count for IndexSearcher increase by one for each replica on each commit. Regards, Markus [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html=20 =20 =20 -----Original message----- > From:Erick Erickson > Sent: Monday 3rd September 2018 20:49 > To: solr-user > Subject: Re: Heap Memory Problem after Upgrading to 7.4.0 >=20 > I would expect at least 1 IndexSearcher per replica, how many total > replicas hosted in your JVM=3F >=20 > Plus, if you're actively indexing, there may temporarily be 2 > IndexSearchers open while the new searcher warms. >=20 > And there may be quite a few caches, at least queryResultCache and > filterCache and documentCache, one of each per replica and maybe two > (for queryResultCache and filterCache) if you have a background > searcher autowarming. >=20 > At a glance, your autowarm counts are very high, so it may take some > time to autowarm leading to multiple IndexSearchers and caches open > per replica when you happen to hit a commit point. I usually start > with 16-20 as an autowarm count, the benefit decreases rapidly as you > increase the count. >=20 > I'm not quite sure why it would be different in 7x .vs. 6x. How much > heap do you allocate to the JVM=3F And do you see similar heap dumps in > 6.6=3F >=20 > Best, > Erick > On Mon, Sep 3, 2018 at 10:33 AM Bj=C3=B6rn H=C3=A4user wrote: > > > > Hello, > > > > we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are running Zookeeper 4.1.13. > > > > Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space exhaustion. After obtaining a heap dump it looks like that we have a lot of IndexSearchers open for our largest collection. > > > > The dump contains around ~60 IndexSearchers, and each containing around ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in my opinion. > > > > The current config can be found here: https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 > > > > Analyzing the heap dump eclipse MAT says this: > > > > Problem Suspect 1 > > > > 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 1.981.148.336 (38,26%) bytes. > > > > Biggest instances: > > > > =E2=80=A2 org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 70.087.272 (1,35%) bytes. > > =E2=80=A2 org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 65.678.264 (1,27%) bytes. > > =E2=80=A2 org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 63.050.600 (1,22%) bytes. > > > > > > Problem Suspect 2 > > > > 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 1.373.110.208 (26,52%) bytes. > > > > > > Any help is appreciated. Thank you very much! > > Bj=C3=B6rn >=20