lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: Can I use Lucene to retrieve a list of duplicates
Date Mon, 26 Feb 2007 16:25:11 GMT

I  got it working before I saw your latest mail, the only problem is 
that it doesn't look very efficient. This is my duplicate method, the 
problem is that I have to enumerate through *every* term. This was worse 
before because I was only interested
in terms that matched a particular field (column) but had enumerate 
through every term whatever field it was part of, so I recreated my 
index so that each document only contained a row number field, and a 
second field for the value of the column, however this means I am going 
to end up with a number of different indexes each solving a particular 


 public List<Integer> getDuplicates()
        List<Integer> matches = new ArrayList<Integer>();
            IndexReader ir =;
            TermEnum terms = ir.terms();
            while (
                if (terms.docFreq() > 1)
                    TermDocs termDocs = ir.termDocs(terms.term());
                    while (
                        Document d = ir.document(termDocs.doc());

        catch (IOException ioe)
        return matches;

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message