Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 94D25200C00 for ; Wed, 18 Jan 2017 16:56:34 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 93478160B3A; Wed, 18 Jan 2017 15:56:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B5A63160B34 for ; Wed, 18 Jan 2017 16:56:33 +0100 (CET) Received: (qmail 73837 invoked by uid 500); 18 Jan 2017 15:56:32 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 73821 invoked by uid 99); 18 Jan 2017 15:56:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jan 2017 15:56:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5074E1A06A8 for ; Wed, 18 Jan 2017 15:56:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.021 X-Spam-Level: X-Spam-Status: No, score=-0.021 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=hmcs.onmicrosoft.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id DhgUd6wPPBzz for ; Wed, 18 Jan 2017 15:56:28 +0000 (UTC) Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0103.outbound.protection.outlook.com [104.47.1.103]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 710605F47A for ; Wed, 18 Jan 2017 15:56:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hmcs.onmicrosoft.com; s=selector1-here-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=MSoism/yqArcgNQz2uShpLt3r4lh1QZkyaad+hisc/0=; b=AC6E/VaFhZOEPcohuQca63kj9lQqfYpvP1jT6QzjU+KVDcsLhFwF0sZdTEtjS7h5nMlAMY0mqBug0olPHqvWCSnB2nOkXlt6Qx/isQJFu7qZw3HUAsPDHQ4fB8wXDgQgTVlqfleGZk/sTtyUaxxOgTrngGSvyHKCGNnEH+qDCXU= Received: from AM5PR0402MB2721.eurprd04.prod.outlook.com (10.175.40.143) by AM5PR0402MB2722.eurprd04.prod.outlook.com (10.175.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.845.12; Wed, 18 Jan 2017 15:56:19 +0000 Received: from AM5PR0402MB2721.eurprd04.prod.outlook.com ([10.175.40.143]) by AM5PR0402MB2721.eurprd04.prod.outlook.com ([10.175.40.143]) with mapi id 15.01.0845.014; Wed, 18 Jan 2017 15:56:20 +0000 From: "Kelly, Frank" To: "solr-user@lucene.apache.org" Subject: Re: Lucene Merge Thread: skip too large Thread-Topic: Lucene Merge Thread: skip too large Thread-Index: AQHScZHy1+5q4dZmoEuXIK6le2+xcqE+WmsA//+1agA= Date: Wed, 18 Jan 2017 15:56:19 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.7.1.161129 authentication-results: spf=none (sender IP is ) smtp.mailfrom=frank.kelly@here.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [131.228.197.9] x-ms-office365-filtering-correlation-id: fcb6509c-57a9-4caa-bfd3-08d43fba8b57 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:AM5PR0402MB2722; x-microsoft-exchange-diagnostics: 1;AM5PR0402MB2722;7:WflamE9stO+iUg2Hxwf12jgDtjE8PDUJMYJcjcgVhijGaDiN0O8NnZvMf+Muy5cmedK/WV1aHgs7VtbJsuU0J04kA8J6w1YRObxOfCiPl9Q8AbeTsuq9KJ1usK989UC30QG6w0FvRoW0ZiGI+j2TTvEMN7rWuZQKLmpcaJ0gv2oKrE8AYCqS/Mg1BCXJqFCIivJry+rcaG51ICLPm+RnS8ktULSNCpWAkjeK9Rbd7MVgageA+iWhr8KV34XPoT4uS0z0CeTSJ7JKXFhExYTvhUyKNubYTf3aOv8OwY5lv4UzYqJaCyOMylbyjx8J8rbveKf+aXknqLFuGA/cmVHI4AyKEM58ierGf9YoJN4p2eBZmpvkv0Qu2VrGxsOq4B4YomoF2jZo0T2y2HrYp8qEJBvW4aeixYmN8mE5bHP2EFffan0nh1tfIUuyVxeVfZssV2WnscCC7WCZ2uX4MK7sQw== x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(277860510277777)(87047540314649)(1227294506794)(28212336023702)(31418570063057)(189930954265078)(128460861657000)(86561027422486)(45079756050767); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6041248)(20161123560025)(20161123558021)(20161123555025)(20161123564025)(20161123562025)(6047074)(6072148);SRVR:AM5PR0402MB2722;BCL:0;PCL:0;RULEID:;SRVR:AM5PR0402MB2722; x-forefront-prvs: 01917B1794 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(7916002)(39450400003)(39410400002)(39840400002)(39850400002)(39860400002)(199003)(24454002)(377454003)(377424004)(189002)(68736007)(105586002)(54356999)(76176999)(6436002)(66066001)(92566002)(2351001)(122556002)(305945005)(101416001)(5660300001)(36756003)(110136003)(99286003)(53936002)(3846002)(106356001)(6116002)(6306002)(6916009)(450100001)(6506006)(77096006)(229853002)(38730400001)(6486002)(2950100002)(107886002)(189998001)(25786008)(2906002)(7736002)(50986999)(2900100001)(102836003)(86362001)(3660700001)(106116001)(8676002)(8936002)(97736004)(83506001)(81166006)(81156014)(2501003)(6512007)(3280700002)(4001350100001)(5640700003);DIR:OUT;SFP:1102;SCL:1;SRVR:AM5PR0402MB2722;H:AM5PR0402MB2721.eurprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; received-spf: None (protection.outlook.com: here.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: here.com X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jan 2017 15:56:19.9222 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 6d4034cd-7225-4f72-b853-91feaea64919 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0402MB2722 archived-at: Wed, 18 Jan 2017 15:56:34 -0000 Thanks Shawn - super helpful as always. -Frank =20 Frank Kelly Principal Software Engineer =20 HERE=20 5 Wayside Rd, Burlington, MA 01803, USA 42=B0 29' 7" N 71=B0 11' 32" W =20 On 1/18/17, 10:23 AM, "Shawn Heisey" wrote: >On 1/18/2017 6:51 AM, Kelly, Frank wrote: >> We=B9re investigating a strange spike in Heap memory usage in our >> Production Solr. >> Heap is stable for days ~ 1.6GB and then suddenly spikes to 3.9 GB and >> we get an OOM. >> >> Our app server behavior using Solr appears to unchanged (no new schema >> updates, no additional indexing or searching we could see) >> We=B9re speculating that perhaps segment merges may be contributing to >> the heap size increase? >> >> *Details* >> Solr 5.3.1=20 >> Solr Cloud deployment with 110M+ documents in 2 Collections (72M and >> 28M) each across 3 shards (each with 3 replicas) >> Heavy indexing vs Query load (API calls are 90% Indexing, 10% querying) >> >> Heap Settings >> -Xmx4096m >> >> Some solrconfig.xml settings >> >> >> 256 >> >> 10000 >> >> >> 10 >> >> 20 >> >> We turned on InfoStream logging and saw the following >> >> 2017-01-18 13:31:55.368 INFO (Lucene Merge Thread #24) >> [c:prod_us-east-1_here_account s:shard1 r:core_node30 >> x:prod_us-east-1_here_account_shard1_replica4] >> o.a.s.u.LoggingInfoStream [TMP][Lucene Merge Thread #24]: >> seg=3D_9eac9(5.3.1):C23776249/1714903:delGen=3D13735 size=3D4338.599 MB >> [skip: too large] > >This "skip: too large" message likely means that the size of this >segment, if merged with other segments, would be larger than the max >segment size. The max size defaults to 5GB, this segment is 4.3GB in >size already. > >I think you've got an incorrect idea of how Java memory works. You >indicated that the heap stays stable at about 1.6GB ... but this is NOT >how Java works. When a piece of memory is allocated by a Java program, >that memory is not reclaimed when the program no longer needs the >object. It is garbage collection, a background process, that frees the >memory. A graph of memory usage from a healthy Java program looks like >a sawblade -- allocations use up all the memory in one of the heap >regions, then garbage collection kicks in and frees up what it can. >Java's normal operation involves constant "spikes" in heap usage. > >The heap usage of Solr will constantly increase as it runs, then garbage >collection will kick in when one of the heap regions reaches capacity, >reclaiming objects that the program no longer needs and freeing up memory. > >OOM happens when garbage collection is unable to free any memory because >all of it is still in use. There are exactly two ways to deal with >OOM: 1) Increase the size of your heap. 2) Make the program use less >memory. > >I have two theories about why your solr install is using up all your >heap and still requesting more: 1) Your Solr caches, particularly your >filterCache, may be very large. 2) You may be doing a large number of >queries that use a lot of memory -- a lot of facets, and/or using a lot >of different fields for sorting. > >Assuming the entire index is on one server, for your 72 million document >index, each filterCache entry is 9 million bytes in size. For your 28 >million document index, each filterCache entry is 3.5 million bytes. >The default size for the filterCache in Solr example configs is 512. If >you actually fill that cache up on a 72 million document index, just the >one cache would require more than the 4GB of memory that you have >allocated to Java. You probably need to decrease the size of the >filterCache. > >If you're doing a lot of facets or sorting, you may need to increase the >heap size. > >Segment merges do use additional memory, but I wouldn't expect that to >be anything more than a minor contributor to heap usage. > >Here's some additional reading on the subject of Solr performance. Most >of this page talks about memory, because that's the limiting factor for >performance in most cases. The page includes some information about >things that can require a lot of heap memory, and steps you may be able >to take to reduce the memory required: > >https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fwiki.= ap >ache.org%2Fsolr%2FSolrPerformanceProblems&data=3D01%7C01%7C%7C101c00a12087= 4f >f75e7908d43fb5ff77%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=3DgAll%2FG= HT >082FTO0%2ByoNQqnjfsbzdL%2BG8CNlavfRrEGo%3D&reserved=3D0 > >Thanks, >Shawn >