lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3215) We should clone the SolrInputDocument before adding locally and then send that clone to replicas.
Date Tue, 22 May 2012 23:40:41 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281318#comment-13281318
] 

Hoss Man commented on SOLR-3215:
--------------------------------

bq. Cloning with the current SolrInputDocument, SolrInputField apis is a little scary - there
is an Object to contend with - but it seems we can pretty much count on that being a primitive
that we don't have to clone?

bq. ... currently we have to offer this (local is special) as well though - it's a requirement
for our current 'super safe' mode that we add locally first.

Strawman suggestion: instead of using simple {{SolrInputDocument.clone()}}, with the scariness
miller mentioned about some processor maybe creating a field value that isn't Clonable, what
if instead we:
 # use the JavaBinCodec to serialize the SolrInputDocument to a byte[]
 # hang onto that byte[] while doing the local add
 # then (re)use that byte[] in all of the requests to the remote replicas

not sure how easy the resuse of the byte[] would be given the existing SolrJ API but...

 * Even if some field values aren't Clonable primatives, they have to be serializable using
the codec or we already have a bigger bug to worry about the the risk of concurrent mods to
the object
 * Bonus: we only pay the cost of serializing the SolrInputDocument once on the leader, not
N times for N replicas.
 * Only "downside" i can think of is that the leader has to buffer the whole doc in memory
as a byte[] instead of streaming it -- but if we assume most shards will have more then N
replicas, the trade off seems worth it -- to me anyway (optimize to use more RAM and save
time/cpu in serializing redundently)

 
                
> We should clone the SolrInputDocument before adding locally and then send that clone
to replicas.
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3215
>                 URL: https://issues.apache.org/jira/browse/SOLR-3215
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.0
>
>         Attachments: SOLR-3215.patch
>
>
> If we don't do this, the behavior is a little unexpected. You cannot avoid having other
processors always hit documents twice unless we support using multiple update chains. We have
another issue open that should make this better, but I'd like to do this sooner than that.
We are going to have to end up cloning anyway when we want to offer the ability to not wait
for the local add before sending to replicas.
> Cloning with the current SolrInputDocument, SolrInputField apis is a little scary - there
is an Object to contend with - but it seems we can pretty much count on that being a primitive
that we don't have to clone?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message