Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Date: Mon, 10 Apr 2017 08:22:16 -0700 (MST)
From: jpereira <jpereira431@gmail.com>
To: solr-user@lucene.apache.org
Message-ID: <1491837736392-4329184.post@n3.nabble.com>
Subject: Dynamic schema memory consumption
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
archived-at: Mon, 10 Apr 2017 15:23:05 -0000

Hello guys,

I manage a Solr cluster and I am experiencing some problems with dynamic
schemas.

The cluster has 16 nodes and 1500 collections with 12 shards per collection
and 2 replicas per shard. The nodes can be divided in 2 major tiers: 
 - tier1 is composed of 12 machines with 4 physical cores (8 virtual), 32GB
ram and 4TB ssd; these are used mostly for direct queries and data exports;
 - tier2 is composed of 4 machines with 20 physical cores (40 virtual),
128GB and 4TB ssd; these are mostly for aggregation queries (facets)

The problem I am experiencing is that when using dynamic schemas, the Solr
heap size rises dramatically. 

I have two tier2 machines (lets call them A and B) running one Solr instance
each with 96GB heap size, with 36 collections totaling 3TB of mainly
fixed-schema (55GB schemaless) data indexed in each machine, and the heap
consumption is on average 60GB (it peaks at around 80GB and drops to around
40GB after a GC run).

On the other tier2 machines (C and D) I was running one Solr instance on
each machine with 32GB heap size and 4 fixed schema collections with about
725GB of data indexed in each machine, which took up about 12GB of heap
size. Recently I added 46 collections to these machines with about 220Gb of
data. In order to do this I was forced to raise the heap size to 64GB and
after indexing everything now the machines have an averaged consumption of
48GB (!!!) (max ~55GB, after GC runs ~37GB)

I also noticed that when indexed fixed schema data the CPU utilization is
also dramatically lower. I have around 100 workers indexing fixed schema
data with and CPU utilization rate of about 10%, while I have only one
worker for schemaless data with a CPU utilization cost of about 20%.

So, I have a two big questions here:
1. Is this dramatic rise in resources consumption when using dynamic fields
"normal"?
2. Is there a way to lower the memory requirements? If so, how?

Thanks for your time!


--
View this message in context: http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184.html
Sent from the Solr - User mailing list archive at Nabble.com.