lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Micka <Pavel.Mi...@zoomint.com>
Subject Solr memory reqs for time-sorted data
Date Fri, 07 Sep 2018 14:39:03 GMT
Hi,

I found on wiki (https://wiki.apache.org/solr/SolrPerformanceProblems#RAM) that optimal amount
of RAM for SOLR is equal to index size. This is lets say the ideal case to have everything
in memory.

We plan to have small installation with 2 nodes and 8shards. We'll have inside the cluster
100M of documents. We expect that each document will take 5kB to index. With in-memory index
this would mean that those two nodes would require ~500GB RAM. This would mean 2x 256GB to
have everything in memory. And those are really big machines... Is this calculation even correct
in new Solr versions?

And we do have a bit restricted problem: Our data are time based logs and we generally have
a restricted search for last 3 months. Which will match let's say 10M of documents. How will
this affect SOLR memory requirements? Will we still need to have the whole inverted indexes
in memory? Or is there some internal optimization, which will ensure that only some part will
need to be in memory?

The questions:

1)      Is the 500GB of memory reqs correct assumption?

2)      Will the fact that we have time-based logs with majority of accesses to recent data
only help?

3)      Is there some best practice how to reduce required RAM in Solr?



Thanks in advance!

Pavel


Side note:
We were thinking about DB partitioning based on Time Routed Aliases, but unfortunately we
need to ensure disaster recovery through a bad network connection. And TRA and Cross Data
Center Replication are not compatible. (CDCR requires static number of cores, while TRA creates
cores dynamically).


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message