lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] Commented: (SOLR-1924) Solr's updateRequestHandler does not have a fast way of guaranteeing document delivery
Date Sun, 17 Oct 2010 23:13:22 GMT


Jan Høydahl commented on SOLR-1924:

In a multi node environment, it would also be useful to maintain state as to whether a batch
is replicated to the slaves. This is because in case of disaster crash on a master, the feeding
client may have got callback that a batch is secured, but it was not yet replicated, i.e.
the only copy was on the now crashed master. The master should be able to keep track of whether
at least one replica has fetched a certain version of the index through the ReplicationHandler.
In this way, a client could choose to act on the replication status instead of persisted status.
The <STATUS> operation would now return an additional state:
<replicated count="1">fooBar0000</replicated> <persisted count="2">fooBar0001
fooBar0002</persisted> <pending count="1">fooBar0003</pending>

> Solr's updateRequestHandler does not have a fast way of guaranteeing document delivery
> --------------------------------------------------------------------------------------
>                 Key: SOLR-1924
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Karl Wright
> It is currently not possible, without performing a commit on every document, to use updateRequestHandler
to guarantee delivery into the index of any document.  The reason is that whenever Solr is
restarted, some or all documents that have not been committed yet are dropped on the floor,
and there is no way for a client of updateRequestHandler to know which ones this happened
> I believe it is not even possible to write a middleware-style layer that stores documents
and performs periodic commits on its own, because the update request handler never ACKs individual
documents on a commit, but merely everything it has seen since the last time Solr bounced.
 So you have this potential scenario:
> - middleware layer receives document 1, saves it
> - middleware layer receives document 2, saves it
> Now it's time for the commit, so:
> - middleware layer sends document 1 to updateRequestHandler
> - solr is restarted, dropping all uncommitted documents on the floor
> - middleware layer sends document 2 to updateRequestHandler
> - middleware layer sends COMMIT to updateRequestHandler, but solr adds only document
2 to the index
> - middleware believes incorrectly that it has successfully committed both documents
> An ideal solution would be for Solr to separate the semantics of commit (the index building
variety) from the semantics of commit (the 'I got the document' variety).  Perhaps this will
involve a persistent document queue that will persist over a Solr restart.
> An alternative mechanism might be for updateRequestHandler to acknowledge specifically
committed documents in its response to an explicit commit.  But this would make it difficult
or impossible to use autocommit usefully in such situations.  The only other alternative is
to require clients that need guaranteed delivery to commit on every document, with a considerable
performance penalty.
> This ticket is related to LCF in that LCF is one of the clients that really needs some
kind of guaranteed delivery mechanism.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message