lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: is there any way to find unique records ?
Date Tue, 21 Nov 2006 15:28:50 GMT
Ok, I think I get it now. You're right that you probably don't want to
iterate the Hits object since that has performance issues once you get
beyond 100 docs or so. Although, I don't know how big your result sets are.
If they are guaranteed to be small, this may not matter.

I'm guessing you want to implement a custom HitCollector. That has it's own
cautions about calling, say, IndexReader.document(id) for each hit, so you
probably want to use TermDocs object. seek() and skipTo() and doc() are your
friends. Although I'd try the simple way of just calling
IndexReader.document(id) first just to see if the performance was
acceptable. Be sure you're looking at a truly representative data set though
<G>...

Hope this helps
Erick

On 11/21/06, Bhavin Pandya <bhavinp@rediff.co.in> wrote:
>
> Hi Erick,
>
> > If your asking for a list of all the unique values for a particular
> field,
> > see TermDocs and/or TermEnum which will allow you to look at, say, all
> the
> > values stored for some field. A trick here is to seek (new Term("field",
> > ""));. By putting nothing in the value, you effectively enumerate them
> > all,
> > something that I didn't find obvious
>
> I think your above solution is very near to what i am looking for ,
> But little bit different way...
> here is what i am planning to do...
>
> Suppose my index has four fields "product-title" , "product-desc" ,
> "category" and "FLAG"    ( Fieldname FLAG has value "true" for each n
> every
> doc in index ...just added for iteration purpose )
>
> At search time.. .
> query =  +(product-title:nokia) +(product-desc:nokia)
> Hits hits = searcher.search(query);
> I want to fetch unique "category" from above hits object...
>
> But i dont want to iterate through Hits object....
>
> Now As per your suggestions,  I can do something like this...
> TermEnum  enum = termDocs(new Term("FLAG","true")
> But it will return enumeration of all the document which is in index...But
> i
> want enumeration of all the document which is relevant to "nokia"...
> How to . . ?
>
> Thanks
> - Bhavin pandya
>
>
> ----- Original Message -----
> From: "Erick Erickson" <erickerickson@gmail.com>
> To: <java-user@lucene.apache.org>; "Bhavin Pandya" <bhavinp@rediff.co.in>
> Sent: Tuesday, November 21, 2006 7:01 PM
> Subject: Re: is there any way to find unique records ?
>
>
> > I don't think I understand what "only unique records from a single
> field"
> > means.  If it's a unique value in a filed, there'll only be one document
> > in
> > the hits object and there's no cost to iterating, so I doubt that's what
> > you
> > mean.
> >
> > If your asking for a list of all the unique values for a particular
> field,
> > see TermDocs and/or TermEnum which will allow you to look at, say, all
> the
> > values stored for some field. A trick here is to seek (new Term("field",
> > ""));. By putting nothing in the value, you effectively enumerate them
> > all,
> > something that I didn't find obvious.
> >
> > If neither of these are close to the mark, perhaps you could provide
> more
> > detail.
> >
> > Best
> > Erick
> >
> > On 11/21/06, Bhavin Pandya <bhavinp@rediff.co.in> wrote:
> >>
> >> Hi,
> >> In lucene, is there any way to find only unique records from a single
> >> field ..?
> >>
> >> otherwise unnecessary i have to itereate through Hits and find out
> >> unique...
> >>
> >> plz help..
> >>
> >> - Bhavin pandya
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message