lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dotan Cohen <>
Subject Re: How might one search for dupe IDs other than faceting on the ID field?
Date Wed, 31 Jul 2013 05:41:32 GMT
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
<> wrote:
> The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe...
> any particular reason you did not use it?
> See:
> and

Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.

public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

    public DiscoAPIUpdateHandler(SolrCore core) {

    public int  addDoc(AddUpdateCommand cmd) throws IOException{

        // if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
        if (!cmd.overwrite) return super.addDoc(cmd);

        // when using ref counted objects you have!! to decrement the
ref count when your done
        RefCounted<SolrIndexSearcher> indexSearcher =

        // the idea is like this we'll make an internal lucene query
and check if that id already exists

        Term updateTerm = null;

        if (cmd.updateTerm != null){
            updateTerm = cmd.updateTerm;
        } else {
            updateTerm = new Term("id",cmd.getIndexedId());

        Query query = new TermQuery(updateTerm);
        TopDocs docs = indexSearcher.get().search(query,2);

        if (docs.totalHits>0){
            // index searcher is no longer needed
            // don't add the new document
            return 0;

        // index searcher is no longer needed

        // if i'm here then it's a new document
        return super.addDoc(cmd);



> And I give a bunch of examples in my book.

I anticipate the book with esteem!

Dotan Cohen

View raw message