lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Unique Fields
Date Wed, 12 Mar 2008 13:49:26 GMT
So, you're tokenizing the title field? If so, I don't understand how you
expect
this to work. Would the title "this is one order" and "is one order this" be
considered
identical? Would capitalization matter? Punctuation? Throwing all the terms
of a title into a tokenized field and expecting some magic to keep
duplicates
is beyond the scope of Lucene, you'll have to roll some customized solution.

For instance, index your title UN_TOKENIZED in a duplicate field (after
applying
whatever massaging you want re: punctuation, spaces, etc.). Use
TermDocs/TermEnum
on that field to detect duplicates. You won't search on this field....

Or create a hash of the title and index *that* in a separate field and check
against
the hash with termenum/terndocs. Or.....

But no, there's no magic that makes Lucene DWIM (Do What I Mean)...

Best
Erick

On Wed, Mar 12, 2008 at 2:01 AM, Ion Badita <ion.badita@searchcapital.net>
wrote:

> The "problem" is that my unique field is a title, many terms per field.
> I want to make an index with titles and i don't want to have duplicates.
>
> John
>
>
> Erick Erickson wrote:
> > You can easily find whether a term is in the index with
> TermEnum/TermDocs
> > (I think TermEnum is all you really need).
> >
> > Except, you'll probably also have to keep an internal map of IDs added
> since
> > the searcher was opened and check against that too.
> >
> > Best
> > Erick
> >
> > On Tue, Mar 11, 2008 at 11:04 AM, Ion Badita <
> ion.badita@searchcapital.net>
> > wrote:
> >
> >
> >> Hi,
> >>
> >> I want to create an index with one unique field.
> >> Before inserting a document i must be sure that "unique field" is
> unique.
> >>
> >>
> >>
> >> John
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message