lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brent P <brent.pear...@gmail.com>
Subject High load, frequent updates, low latency requirement use case
Date Fri, 26 Aug 2016 02:51:43 GMT
I'm trying to set up a Solr Cloud cluster to support a system with the
following characteristics:

It will be writing documents at a rate of approximately 500 docs/second,
and running search queries at about the same rate.
The documents are fairly small, with about 10 fields, most of which range
in size from a simple int to a string that holds a UUID. There's a date
field, and then three text fields that typically hold in the range of 350
to 500 chars.
Documents should be available for searching within 30 seconds of being
added.
We need an average search latency of 50 ms or faster.

We've been using DataStax Enterprise with decent results, but trying to
determine if we can get more out of the latest version of Solr Cloud, as we
originally chose DSE ~4 years ago *I believe* because its Cassandra-backed
Solr provided redundancy/high availability features that weren't currently
available with straight Solr (not even sure if Solr Cloud was available
then).

We have 24 fairly beefy servers (96 CPU cores, 256 GB RAM, SSDs) for the
task, and I'm trying to figure out the best way to distribute the documents
into collections, cores, and shards.

If I can categorize a document into one of 8 "types", should I create 8
collections? Is that going to provide better performance than putting them
all into one collection and then using a filter query with the type field
when doing a search?

What are the options/things to consider when deciding on the number of
shards for each collection? As far as I know, I don't choose the number of
Solr cores, that is just determined base on the replication factor (and
shard count?).

Some of the settings I'm using in my solrconfig that seem important:
<lockType>${solr.lock.type:native}</lockType>
<autoCommit>
  <maxTime>${solr.autoCommit.maxTime:30000}</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
  <maxTime>${solr.autoSoftCommit.maxTime:1000}</maxTime>
</autoSoftCommit>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>8</maxWarmingSearchers>

I've got the updateLog/transaction log enabled, as I think I read it's
required for Solr Cloud.

Are there any settings I should look at that affect performance
significantly, especially outside of the solrconfig.xml for each collection
(like jetty configs, logging properties, etc)?

How much impact do the <lib/> directives in the solrconfig have on
performance? Do they only take effect if I have something configured that
requires them, and therefore if I'm missing one that I need, I'd get an
error if it's not defined?

Any help will be greatly appreciated. Thanks!
-Brent

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message