lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Query to find documents whihc contain the same value for a field, i.e duplicate fields
Date Thu, 22 Dec 2011 16:20:21 GMT
On 20/12/2011 19:38, Paul Taylor wrote:
> So I had this code, that would return all documents where there was 
> more than one document that had the same value for fieldname. Trouble 
> is I didn't realise this could return documents
> that had been deleted, so Im wondering what an equivalent using 
> queries would be.
>
>
> public List<Integer> getDuplicates(int columnModelId)
> {
>        String fieldname = String.valueOf(columnModelId);
>        List<Integer> matches = new ArrayList<Integer>();
>         if (AudioDataModel.getInstance().getRowCount() == 0)
>         {
>             return matches;
>         }
>
>         IndexReader ir;
>
>         try
>         {
>             ir = getIndexReader();
>             TermEnum terms = ir.terms(new Term(fieldName, ""));
>             do
>             {
>                 if (terms.term() != null)
>                 {
>                     if (terms.docFreq() > 1)
>                     {
>                         TermDocs termDocs = ir.termDocs(terms.term());
>                         while (termDocs.next())
>                         {
>                             Document d = ir.document(termDocs.doc());
>                             matches.add(new 
> Integer(d.getFieldable(ROW_NUMBER).stringValue()));
>                         }
>                     }
>                 }
>             }
>             while (terms.next() && 
> terms.term().field().equals(fieldName));
>         }
>         catch (IOException ioe)
>         {
>             MainWindow.logger.log(Level.WARNING, "DataIndexer.Problem 
> searching for duplicates:" + ioe.getMessage(), ioe);
>         }
>         return matches;
FYI

I stuck with this code but added the IndexReader.isDeleted() check to 
ensure the doc was still valid

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message