lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Query to find documents whihc contain the same value for a field, i.e duplicate fields
Date Tue, 20 Dec 2011 19:38:56 GMT
So I had this code, that would return all documents where there was more 
than one document that had the same value for fieldname. Trouble is I 
didn't realise this could return documents
that had been deleted, so Im wondering what an equivalent using queries 
would be.


public List<Integer> getDuplicates(int columnModelId)
{
        String fieldname = String.valueOf(columnModelId);
        List<Integer> matches = new ArrayList<Integer>();
         if (AudioDataModel.getInstance().getRowCount() == 0)
         {
             return matches;
         }

         IndexReader ir;

         try
         {
             ir = getIndexReader();
             TermEnum terms = ir.terms(new Term(fieldName, ""));
             do
             {
                 if (terms.term() != null)
                 {
                     if (terms.docFreq() > 1)
                     {
                         TermDocs termDocs = ir.termDocs(terms.term());
                         while (termDocs.next())
                         {
                             Document d = ir.document(termDocs.doc());
                             matches.add(new 
Integer(d.getFieldable(ROW_NUMBER).stringValue()));
                         }
                     }
                 }
             }
             while (terms.next() && terms.term().field().equals(fieldName));
         }
         catch (IOException ioe)
         {
             MainWindow.logger.log(Level.WARNING, "DataIndexer.Problem 
searching for duplicates:" + ioe.getMessage(), ioe);
         }
         return matches;

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message