lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (SOLR-2200) DIH DocBuilder - Improve perf. on large delta deletes
Date Tue, 26 Oct 2010 17:10:20 GMT


Robert Muir commented on SOLR-2200:

Mark, thanks for your contribution.

Seems like a no-brainer to me, and all tests pass with the patch.

I'd like to commit this unless anyone has objections.

> DIH DocBuilder - Improve perf. on large delta deletes
> -----------------------------------------------------
>                 Key: SOLR-2200
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4.1
>            Reporter: Mark Waddle
>         Attachments: SOLR-2200.patch
> In collectDelta, the procedure that collects the PKs for the documents that should be
updated or deleted for an entity, iterates over the entire deltaSet for every deleted document.
This is very expensive when you are updating and deleting millions of documents in one delta-import.
> Considering that the comparison between deleted and delta is on the PK, lets build the
deltaSet as a HashMap instead of a HashSet to enable quick key lookups and remove the need
for repeated iterations.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message