lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <clampre...@gmail.com>
Subject Re: How to efficiently get # of search results, per attribute
Date Sun, 14 Nov 2004 00:46:51 GMT
Nader and Chuck,

Thanks for the responses, they're both helpful.  My index sizes will
begin on the order of 200,000 classes, and 20,000 instructors (and
much fewer departments), and grow over time to maybe a few million
classes.  Compared to some of the numbers I've seen on this mailing
list, my dataset is fairly small.  I think I'll not worry about
performance for now, until & unless it becomes an issue.

-Chris

On Sat, 13 Nov 2004 15:36:11 -0800, Chuck Williams <chuck@manawiz.com> wrote:
> My Lucene application includes multi-faceted navigation that does a more
> complex version of the below.  I've got 5 different taxonomies into
> which every indexed item is classified.  The largest of the taxonomies
> has over 15,000 entries while the other 4 are much smaller. For every
> search query, I determine the best small set of nodes from each taxonomy
> to present to the user as drill down options, and provide the counts
> regarding how many results fall under each of these nodes.  At present I
> only have about 25,000 indexed objects and usually no more than 1,000
> results from the initial query.  To determine the drill-down options and
> counts, I scan up to 1,000 results computing the counts for all nodes
> into which these results classify.  Then for each taxonomy I pick the
> best drill-down options available (orthogonal set with reasonable
> branching factor that covers all results) and present them with their
> counts.  If there are more than 1,000 results, I extrapolate the
> computed counts to estimate the actual counts on the entire set of
> results.  This is all done with a single index and a single search.
> 
> The total time required for performing this computation for the one
> large taxonomy is under 10ms, running in full debug mode in my ide.  The
> query response time overall is subjectively instantaneous at the UI
> (Google-speed or better).  So, unless some dimension of the problem is
> much bigger than mine, I doubt performance will be an issue.
> 
> Chuck
> 
> 
> 
>  > -----Original Message-----
>  > From: Nader Henein [mailto:nsh@bayt.net]
>  > Sent: Saturday, November 13, 2004 2:29 AM
>  > To: Lucene Users List
>  > Subject: Re: How to efficiently get # of search results, per
> attribute
>  >
>  > It depends on how many results they're looking through, here are two
>  > scenarios I see:
>  >
>  > 1] If you don't have that many records you can fetch all the results
> and
>  > then do a post parsing step the determine totals
>  >
>  > 2] If you have a lot of entries in each category and you're worried
>  > about fetching thousands of records every time, you can just have
>  > seperate indecies per category and search them in in parallel (not
>  > Lucene Parallel Search) and you can get up to 100 hits for each one
>  > (efficiency) but you'll also have the total from the search to
> display.
>  >
>  > Either way you can boost up speed using RamDirectory if you need
> more
>  > speed from the search, but whichever approach you choose I would
>  > recommend that you sit down and do some number crunching to figure
> out
>  > which way to go.
>  >
>  >
>  > Hope this helps
>  >
>  > Nader Henein
>  >
>  >
>  >
>  > Chris Lamprecht wrote:
>  >
>  > >I'd like to implement a search across several types of "entities",
>  > >let's say, classes, professors, and departments.  I want the user
> to
>  > >be able to enter a simple, single query and not have to specify
> what
>  > >they're looking for.  Then I want the search results to be
> something
>  > >like this:
>  > >
>  > >Search results for: "philosophy boyer"
>  > >
>  > >Found: 121 classes - 5 professors - 2 departments
>  > >
>  > ><search results here...>
>  > >
>  > >
>  > >I know I could iterate through every hit returned and count them up
>  > >myself, but that seems inefficient if there are lots of results.
> Is
>  > >there some other way to get this kind of information from the
> search
>  > >result set?  My other ideas are: doing a separate search each
> result
>  > >type, or storing different types in different indexes.  Any
>  > >suggestions?  Thanks for your help!
>  > >
>  > >-Chris
>  > >
>  >
> >---------------------------------------------------------------------
>  > >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>  > >For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
>  > >
>  > >
>  > >
>  > >
>  > >
>  > >
>  >
>  >
> ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>  > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message