lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: how to use DuplicateFilter to get unique documents based on a fieldName
Date Fri, 05 Mar 2010 09:30:55 GMT
Hi Anish,
So am I getting something wrong here? You said "I have created a search
index on book Id , title ,and author from a database of books which fall
under various categories." so those are 3 fields, right?
1. How do you filter the doc types (as in the genres) at search time? Do you
even need to do that, if yes how?
2. If you're doing that 'm assuming you're already indexing the genre
somehow. Right?
3. How about a field for the genre having multi-valued entries (multiple
field objects going into the same doc with the same field label). This would
help you store 1 doc as 1 doc having multiple genres instead of duplicate
entries.

I'm still not sure if I've gotten tre problem correctly, but hope this is of
help!

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Fri, Mar 5, 2010 at 12:07 PM, anisha@ekkitab <anisha@ekkitab.com> wrote:

>
> Hi Zhangchi
>
>
> Thanks for your reply.
>
> We have about 3 million records (different isbns) in the database and
> documents little more than that, and we wouldn't want to do the deduping at
> indexing time, because one book ( one isbn ) can be available under 2 or
> more categories( like fiction, comics & novels, science etc)
>
> We had actually applied filter on the primary key ie ID, and it wasn't
> working, so I was hoping for some sample code. But then we found out that
> the field name on which we wanted the duplicate filter to be applied (Id)
> was not actually indexed while adding it into the document. ie Field.Index
> was set to NO. We changed this, repopulated the documents and the filtering
> works now.
>
> Thanks for your time.
>
>
>
>
> zhangchi wrote:
> >
> >
> > i think you should check the index first.using the lukeall to see if
> there
> > is the duplicate books.
> >
> > On Thu, 04 Mar 2010 20:43:26 +0800, anisha@ekkitab <anisha@ekkitab.com>
> > wrote:
> >
> >>
> >> Hi there, Could someone help me with the usage of DuplicateFilters. Here
> >> is
> >> my problem
> >>
> >> I have created a search index on book Id , title ,and author from a
> >> database
> >> of books which fall under various categories. Some books fall under more
> >> than one category. Now, when i issue a search, I get back 'X' books
> >> matching
> >> the search criteria, some of which are repeated, because that books are
> >> in
> >> different documents and its the expected behaviour.
> >>
> >> I use the  TopFieldDocCollector . getTotalHits() to get the total count.
> >> But
> >> this includes the repeats as mentioned above. This count is not the
> >> actual
> >> count, Hence when I issue a search on title or author i want to get a
> >> unique
> >> count / list of books. How do I use DuplicateFilter to acheive this.
> >>
> >> Please help
> >>
> >> Regards
> >> Anish
> >
> >
> > --
> > Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/how-to-use-DuplicateFilter-to-get-unique-documents-based-on-a-fieldName-tp27780251p27790391.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message