lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Top field count scoring across documents
Date Sun, 22 Nov 2009 17:20:24 GMT
Peter,

   You want to do a facet query.  This kind of functionality is not in
Lucene-core (sadly), but both Solr (the fully featured search application
built on Lucene) and bobo-browse (just a library, like Lucene itself) are
open-source and work with Lucene to provide faceting capabilities for you.

   -jake

On Sun, Nov 22, 2009 at 8:42 AM, Peter 4U <peter4u@hotmail.com> wrote:

>
> Hello Lucene Experts,
>
> I wonder if someone might be able to shed some insight on this interesting
> scoring question:
>
> The problem:
> Build a search query that will return [ordered] hits by the top number of
> occurences of field values across matched documents (or as close to this as
> possible).
> The built-in scoring is great for scoring number of hits within a document,
> but is there an efficient way to do this across the same field in a set of
> matched documents? (maybe scoring isn't the best way?)
>
> Example:
> Let's say you have an index containing book information. Each document has
> a 'title' field.
> Let's say the index contains 100 entries, with:
> 65 'title's containing the word 'tiger'
> 21 containing 'lion'
> 6 containing 'panther'
> 5 containing 'kitten'
> 3 containing 'slug'
>
> What would be the best way to build a query such that returned documents
> are ordered in this way:
> Rank    Value         Occurences
> ================================
> 1       tiger            65
> 2       lion             21
> 3       panther          6
> 4       kitten           5
> 5       slug             3
>
> I can, of course, build a standard query, traverse the returned documents
> and build such a list, but if the returned query had many 100,000's of hits,
> the performance would degrade linearly, particularly if only the 'Top 5' are
> actually required.
>
>
> One idea is to maintain a separate index with this information - the main
> problem with this is that you essentially need to know what you're searching
> for at index-time, which isn't ideal.
>
>
> Has anyone come across and solved this particular issue using Lucene?
>
> Many thanks,
> Peter
>
>
>
> _________________________________________________________________
> Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
> http://clk.atdmt.com/UKM/go/186394592/direct/01/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message