lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dotan Cohen <dotanco...@gmail.com>
Subject Re: How might one search for dupe IDs other than faceting on the ID field?
Date Wed, 31 Jul 2013 05:41:32 GMT
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
<jack@basetechnology.com> wrote:
> The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe...
> any particular reason you did not use it?
>
> See:
> http://wiki.apache.org/solr/Deduplication
>
> and
>
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>

Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.

public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

    public DiscoAPIUpdateHandler(SolrCore core) {
        super(core);
    }

    @Override
    public int  addDoc(AddUpdateCommand cmd) throws IOException{

        // if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
        if (!cmd.overwrite) return super.addDoc(cmd);


        // when using ref counted objects you have!! to decrement the
ref count when your done
        RefCounted<SolrIndexSearcher> indexSearcher =
this.core.getNewestSearcher(false);

        // the idea is like this we'll make an internal lucene query
and check if that id already exists

        Term updateTerm = null;


        if (cmd.updateTerm != null){
            updateTerm = cmd.updateTerm;
        } else {
            updateTerm = new Term("id",cmd.getIndexedId());
        }


        Query query = new TermQuery(updateTerm);
        TopDocs docs = indexSearcher.get().search(query,2);

        if (docs.totalHits>0){
            // index searcher is no longer needed
            indexSearcher.decref();
            // don't add the new document
            return 0;
        }

        // index searcher is no longer needed
        indexSearcher.decref();

        // if i'm here then it's a new document
        return super.addDoc(cmd);

    }

}


> And I give a bunch of examples in my book.
>

I anticipate the book with esteem!

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Mime
View raw message