lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: How to avoid duplicate records in lucene
Date Mon, 21 Jul 2008 14:18:26 GMT
could you define duplicate? As far as I know, you don't
get the same (internal) doc id back more than once, so what
is a duplicate?

Best
Erick

On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <sebasmtech@gmail.com> wrote:

>
> at the time search , while querying the data
> markrmiller wrote:
> >
> > Sebastin wrote:
> >> Hi All,
> >>
> >> Is there any possibility to avoid duplicate records in lucene  2.3.1?
> >>
> > I don't believe that there is a very high performance way to do this.
> > You are basically going to have to query the index for an id before
> > adding a new doc. The best way I can think of off the top of my head is
> > to batch - first check that ids in the batch are unique, then check all
> > ids in the batch against the IndexReader, then add the ones that are not
> > dupes. Of course all of your docs would have to be added through this
> > single choke point so that you knew other threads had not added that id
> > after the first thread had looked but before it added the doc.
> >
> > I think Mark H has you covered if getting the dupes out after are okay.
> >
> > - Mark
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/How-to-avoid-duplicate-records-in-lucene-tp18543588p18568862.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message