lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pisarev, Vitaliy" <>
Subject Deciding how to correctly use Solr multicore
Date Sun, 09 Feb 2014 09:22:34 GMT

We are evaluating Solr usage in our organization and have come to the point where we are past
the functional tests and are now looking in choosing the best deployment topology.
Here are some details about the structure of the problem: The application deals with storing
and retrieving artifacts of various types. The artifact are stored in Projects. Each project
can have hundreds of thousands of artifacts (total on all types) and our largest customers
have hundreds of projects (~300-800) though the vast majority have tens of project (~30-100).

Core granularity
In terms of Core granularity- it seems to me that a core per project is sensible, as pushing
everything to a single core will probably be too much. The entities themselves will have a
special type field for distinction.
Moreover, it may be that not all of the project are active in a given time so this allows
their indexes to remain on latent on disk.

Availability and synchronization
Our application is deployed on premise on our customers sites- we cannot go too crazy as to
the amount of extra resources we demand from them- e.g. dedicated indexing servers. We pretty
much need to make do with what is already there.

For now, we are planning to use the DIH to maintain the index. Each node the cluster on the
app will have its own local index. When a project is created (or the feature is enabled on
an existing project), a core is created for it on each one of the nodes, a full import is
executed and then a delta import is scheduled to run on each one of the nodes. This gives
us simplicity but I am wondering about the performance and memory consumption costs? Also,
I am wondering whether we should use replication for this purpose. The requirement is for
the index to be updated once in 30 seconds - are delta imports design for this?

I understand that this is a very complex problem in general. I tried to highlight all the
most significant aspects and will appreciate some initial guidance. Note that we are planning
to execute performance and stress testing no matter what but the assumption is that the topology
of the solution can be predetermined with the existing data.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message