lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Commented] (SOLR-3215) We should clone the SolrInputDocument before adding locally and then send that clone to replicas.
Date Mon, 21 May 2012 19:21:43 GMT


Hoss Man commented on SOLR-3215:

bq. DistributedUpdateProcessor should come right before RunUpdateProcessor (or are you assuming
we might support random update processors in-between? Are there use cases for this?)

the main scenerio i've seen/heard mentioned is the idea of processors that are computationally
cheap, but increase the size of the document significantly (ie: clone a big ass text field
and strip the html from the clone) so you want it to happen after distrib (on every replica)
to minimize the amount of data sent over the wire.


bq. Intuitively, you expect that processors that run after the distrib processor will not
hit the document sent to replicas before the docs are sent to replicas - but it will.

to clarify (because i kept not-understanding what the crux of the issue was here so if i post
this comment i'll remember next time w/o needing to ask mark on IRC _again_) if we do *NOT*
clone the doc, there is a race condition where local processors executing after the distrib
processor may modify the documents before the are serialized and forwarded to one or more

one way to avoid this would be to stop treating the "local" replica as special, and instead
have distrib forward back to localhost (via HTTP) just like every other replica) and abort
the current request. 
> We should clone the SolrInputDocument before adding locally and then send that clone
to replicas.
> -------------------------------------------------------------------------------------------------
>                 Key: SOLR-3215
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.0
>         Attachments: SOLR-3215.patch
> If we don't do this, the behavior is a little unexpected. You cannot avoid having other
processors always hit documents twice unless we support using multiple update chains. We have
another issue open that should make this better, but I'd like to do this sooner than that.
We are going to have to end up cloning anyway when we want to offer the ability to not wait
for the local add before sending to replicas.
> Cloning with the current SolrInputDocument, SolrInputField apis is a little scary - there
is an Object to contend with - but it seems we can pretty much count on that being a primitive
that we don't have to clone?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message