Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 18D70200C52 for ; Mon, 10 Apr 2017 17:23:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 132A3160B99; Mon, 10 Apr 2017 15:23:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5B968160B85 for ; Mon, 10 Apr 2017 17:23:04 +0200 (CEST) Received: (qmail 74306 invoked by uid 500); 10 Apr 2017 15:23:02 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Delivered-To: moderator for solr-user@lucene.apache.org Received: (qmail 73269 invoked by uid 99); 10 Apr 2017 15:22:19 -0000 X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.736 X-Spam-Level: *** X-Spam-Status: No, score=3.736 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_ENVFROM_END_DIGIT=0.25, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled Date: Mon, 10 Apr 2017 08:22:16 -0700 (MST) From: jpereira To: solr-user@lucene.apache.org Message-ID: <1491837736392-4329184.post@n3.nabble.com> Subject: Dynamic schema memory consumption MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit archived-at: Mon, 10 Apr 2017 15:23:05 -0000 Hello guys, I manage a Solr cluster and I am experiencing some problems with dynamic schemas. The cluster has 16 nodes and 1500 collections with 12 shards per collection and 2 replicas per shard. The nodes can be divided in 2 major tiers: - tier1 is composed of 12 machines with 4 physical cores (8 virtual), 32GB ram and 4TB ssd; these are used mostly for direct queries and data exports; - tier2 is composed of 4 machines with 20 physical cores (40 virtual), 128GB and 4TB ssd; these are mostly for aggregation queries (facets) The problem I am experiencing is that when using dynamic schemas, the Solr heap size rises dramatically. I have two tier2 machines (lets call them A and B) running one Solr instance each with 96GB heap size, with 36 collections totaling 3TB of mainly fixed-schema (55GB schemaless) data indexed in each machine, and the heap consumption is on average 60GB (it peaks at around 80GB and drops to around 40GB after a GC run). On the other tier2 machines (C and D) I was running one Solr instance on each machine with 32GB heap size and 4 fixed schema collections with about 725GB of data indexed in each machine, which took up about 12GB of heap size. Recently I added 46 collections to these machines with about 220Gb of data. In order to do this I was forced to raise the heap size to 64GB and after indexing everything now the machines have an averaged consumption of 48GB (!!!) (max ~55GB, after GC runs ~37GB) I also noticed that when indexed fixed schema data the CPU utilization is also dramatically lower. I have around 100 workers indexing fixed schema data with and CPU utilization rate of about 10%, while I have only one worker for schemaless data with a CPU utilization cost of about 20%. So, I have a two big questions here: 1. Is this dramatic rise in resources consumption when using dynamic fields "normal"? 2. Is there a way to lower the memory requirements? If so, how? Thanks for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184.html Sent from the Solr - User mailing list archive at Nabble.com.