lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Solr 4.8.1 multiple client updates the same collection
Date Fri, 26 Jan 2018 19:20:31 GMT
On 1/26/2018 4:23 AM, Vincenzo D'Amore wrote:
> The first client does the following:
>
> 1. rollbacks all add/deletes made to the index since the last commit (in
> case previous client execution was completed unsuccessfully).
> 2. reads data from sql server
> 3. updates solr documents
> 4. manually commits
>
> And *important*, once a day, the first client deletes all the existing
> documents and reindex the entire collection from scratch.
>
> The second client is simpler, it manually commits after every atomic update.

The fact that one client is deleting everything and reindexing changes
the landscape dramatically.

Since I do not know anything about your setup, I'll make up a similar
scenario and describe what I see as the potential problems.

Let's say that this theoretical index contains one million documents.  A
full reindex of this index takes 2 hours and starts at midnight.  While
the reindex is happening, the first client doesn't do "normal" updates. 
The second client runs every ten minutes (x:00, x:10, etc), and is
completely unaware of what the first client is doing.

At 12:01 AM, the full delete has happened to the "under construction"
version of the index, and the reindex has been running for one minute. 
Everything is fine, anyone searching will have the full index available.

At 12:10 AM, let's imagine that the second client is going to update one
document with the atomic update feature.  If the full reindex has
indexed that document, this will work, but if it hasn't, the atomic
update is going to fail.  For the purposes of this scenario, let's
assume that the atomic update succeeds, and the second client does its
commit.  When the second client's commit finishes, the index will have a
little over 80000 documents in it, instead of one million, because all
the documents were deleted and the reindex is only about eight percent
complete.  The same thing would also happen when autoSoftCommit gets
triggered after an update, if autoSoftCommit is configured.

If the second client can be paused while the first client is reindexing,
and you don't configure autoSoftCommit, then everything will be fine. 
But if the second client does its work while the reindex is underway,
there will be problems.

Separate side issue: The fact that your first client does rollbacks
could potentially roll back changes made by the second client, unless
you can guarantee that the second client will wait until the first
client is idle.

Thanks,
Shawn


Mime
View raw message