lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Waddle (JIRA)" <j...@apache.org>
Subject [jira] Created: (SOLR-2200) DIH DocBuilder - Improve perf. on large delta deletes
Date Tue, 26 Oct 2010 15:52:23 GMT
DIH DocBuilder - Improve perf. on large delta deletes
-----------------------------------------------------

                 Key: SOLR-2200
                 URL: https://issues.apache.org/jira/browse/SOLR-2200
             Project: Solr
          Issue Type: Improvement
          Components: contrib - DataImportHandler
    Affects Versions: 1.4.1
            Reporter: Mark Waddle


In collectDelta, the procedure that collects the PKs for the documents that should be updated
or deleted for an entity, iterates over the entire deltaSet for every deleted document. This
is very expensive when you are updating and deleting millions of documents in one delta-import.
Considering that the comparison between deleted and delta is on the PK, lets build the deltaSet
as a HashMap instead of a HashSet to enable quick key lookups and remove the need for repeated
iterations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message