lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "steve" <st...@browsermedia.com>
Subject Re: How to get unique Hits using Multisearcher
Date Tue, 29 Jun 2004 14:05:23 GMT
Thanks for the quick reply.

I have played with this just a bit, but it seems there is no quick way to
get the Hits.hitsDocs vector in one swoop. The Hits class is final so I
cannot extend it, and there are no public accessors to get/set the hitDocs.
It seems I will need to loop through Hits.doc(int) and extract the hits one
at a time. Do you know another way to do this?


----- Original Message ----- 
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, June 29, 2004 9:29 AM
Subject: Re: How to get unique Hits using Multisearcher


> Steve,
>
> Answer to your last question: that's right.
> Lucene does not know what field you use as your PK (unique document
> identifier), so it cannot merge hits and remove duplicates.
>
> As for converting Vector to Set.... HashSet set = new HashSet();
> set.addAll(YourVectorInstance);
> Does that work for you?  Vector isA Collection.
>
> Otis
>
>
>
> --- steve <steve@browsermedia.com> wrote:
> > I saw a similar - but not identical - question asked earlier in the
> > archive
> > but no answer.
> >
> > I have 2 (or more)  indexes of web url's with intersecting hits. The
> > url's
> > are defined as keys in case that makes a difference. I am using
> > MultiSearcher to search multiple indexes, but I get hits repeated if
> > they
> > exist in both indexes. I am trying to get a set of all unique url's
> > among
> > the indexes.
> >
> > Can MultiSearcher be told not to repeat hits with duplicate "key"
> > values? Or
> > does it already do this indicating my Doc's are not defined properly?
> > As a
> > last resort, can someone recommend an efficient method to convert the
> > Vector
> > of hitDocs into a Set after the fact?
> >
> > FYI - as a test, I used MultiSearcher to search one index and it
> > found 45
> > hits. I then gave MultiSearher 2 Searchers pointing to the same
> > index, and
> > it found 90 hits. From this I concluded that MultiSearher merely adds
> > hits
> > to the Vector rather than looking for duplicates. Is that right?
> >
> > TIA,
> >
> > Steve B.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message