lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-3473) Distributed deduplication broken
Date Wed, 30 May 2012 23:33:23 GMT

     [ https://issues.apache.org/jira/browse/SOLR-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-3473:
---------------------------

    Attachment: SOLR-3473.patch

Updated patch to include my (meager) attempt at fixing the problem by making processAdd immediately
execute a deleteByQuery if the add includes an updateTerm.

I banged my head against a bunch of version mismatch errors to get into the current state
of the patch, such that all the updates succeed, but the query assertions in the test still
fail indicating that docs with duplicate signatures are making it into the index.

On the up side: far fewer duplicates are making it into the index now then before the patch
(when docs would only be deleted from the node that got the initial request _if_ that node
happened to be a shard leader)...

bq. wrong number of deduped docs (added 68 total) expected:<7> but was:<10>
bq. wrong number of deduped docs (added 71 total) expected:<7> but was:<8>
bq. wrong number of deduped docs (added 70 total) expected:<7> but was:<9>

...so apparently there is still some tiny corner case code path where dups are sneaking in
(either that or the existing deleteByQuery code isn't reliable).

I'm fairly certain i'm out of my depth at this point.
                
> Distributed deduplication broken
> --------------------------------
>
>                 Key: SOLR-3473
>                 URL: https://issues.apache.org/jira/browse/SOLR-3473
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud, update
>    Affects Versions: 4.0
>            Reporter: Markus Jelsma
>             Fix For: 4.0
>
>         Attachments: SOLR-3473.patch, SOLR-3473.patch
>
>
> Solr's deduplication via the SignatureUpdateProcessor is broken for distributed updates
on SolrCloud.
> Mark Miller:
> {quote}
> Looking again at the SignatureUpdateProcessor code, I think that indeed this won't currently
work with distrib updates. Could you file a JIRA issue for that? The problem is that we convert
update commands into solr documents - and that can cause a loss of info if an update proc
modifies the update command.
> I think the reason that you see a multiple values error when you try the other order
is because of the lack of a document clone (the other issue I mentioned a few emails back).
Addressing that won't solve your issue though - we have to come up with a way to propagate
the currently lost info on the update command.
> {quote}
> Please see the ML thread for the full discussion: http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message