lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Can I use Lucene to retrieve a list of duplicates
Date Mon, 26 Feb 2007 10:34:52 GMT
Hi,

Sorry I don't see how I get access to TermEnums. So far Ive created a 
document per row, the first field holds the row id, then i have one  
field per column, and checked  the index has been created ok with some 
search querys.
I now want to pass a column to check, and receive  a list of all the 
documents that contain  a  term  in that column which is used by at 
least one other document for that column ( a duplicate term).

thanks paul

Chris Hostetter wrote:
> : Thanks this might do it, but do I need to know the terms beforehand, I
> : just want to return any terms with frequency more than one?
>
> no, TermEnum will let you iterate over all the terms ... you don't even
> need TermDocs if you just want the docFreq for each term (which would be 1
> if there are no duplicates)
>
> : Erick Erickson wrote:
> : > Sure, you can use the TermDocs/TermEnum classes. Basically, for a term
> : > (probably column value in your app) these let you quickly answer the
> : > question "which (and how many) documents does this term appear in".
> : > What you get is the Lucene doc id, which let's you fetch all the
> : > information about the documents you want.
> : >
> : > Erick
> : >
> : > On 2/23/07, *Paul Taylor* <paul_t100@fastmail.fm
> : > <mailto:paul_t100@fastmail.fm>> wrote:
> : >
> : >     Hi I have Java Swing application with a table, I was considering using
> : >     Lucene to index the data in the table. One task Id like to do is
> : >     for the
> : >     user to select 'Find Duplicate records for Column X', then I would
> : >     filter the table to show only records where there is more than one
> : >     with
> : >     the same value i.e duplicate for that column. Is there a way to return
> : >     all the duplicates from a Lucene index.
> : >
> : >     thanks paul Taylor
> : >
> : >     ---------------------------------------------------------------------
> : >     To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : >     <mailto:java-user-unsubscribe@lucene.apache.org>
> : >     For additional commands, e-mail: java-user-help@lucene.apache.org
> : >     <mailto:java-user-help@lucene.apache.org>
> : >
> : >
> : > ------------------------------------------------------------------------
> : >
> : > Internal Virus Database is out-of-date.
> : > Checked by AVG Free Edition.
> : > Version: 7.1.394 / Virus Database: 268.16.5/616 - Release Date: 04/01/2007
> : >
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message