lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3473) Distributed deduplication broken
Date Mon, 21 May 2012 16:30:41 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280264#comment-13280264
] 

Markus Jelsma commented on SOLR-3473:
-------------------------------------

That makes sense indeed.

To work around the problem of having the digest field as ID, could it not simply issue a deleteByQuery
for the digest prior to adding it? Would that cause significant overhead for very large systems
with many updates?

We would, from Nutch' point of view, certainly want to avoid changing the ID from URL to digest.




                
> Distributed deduplication broken
> --------------------------------
>
>                 Key: SOLR-3473
>                 URL: https://issues.apache.org/jira/browse/SOLR-3473
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud, update
>    Affects Versions: 4.0
>            Reporter: Markus Jelsma
>             Fix For: 4.0
>
>
> Solr's deduplication via the SignatureUpdateProcessor is broken for distributed updates
on SolrCloud.
> Mark Miller:
> {quote}
> Looking again at the SignatureUpdateProcessor code, I think that indeed this won't currently
work with distrib updates. Could you file a JIRA issue for that? The problem is that we convert
update commands into solr documents - and that can cause a loss of info if an update proc
modifies the update command.
> I think the reason that you see a multiple values error when you try the other order
is because of the lack of a document clone (the other issue I mentioned a few emails back).
Addressing that won't solve your issue though - we have to come up with a way to propagate
the currently lost info on the update command.
> {quote}
> Please see the ML thread for the full discussion: http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message