lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr's Filtering approaches
Date Thu, 10 Oct 2013 11:55:49 GMT
Well, my first question is why 50K groups is necessary, and
whether you can simplify that. How a user can manually
choose from among that many groups is "interesting". But
assuming they're all necessary, I can think of two things.

If the user can only select ranges, just put in filter queries
using ranges. Or possibly both ranges and individual entries,
as fq=group:[1A TO 10000A] OR group:(2B 45C 98Z) etc.
You need to be a little careful how you put index these so
range queries work properly, in the above you'd miss
2A because it's sorting lexicographically, you'd need to
store in some form that sorts like 0000001A 010000A
and so on. You wouldn't need to show that form to the
user, just form your fq's in the app to work with
that form.

If that won't work (you wouldn't want this to get huge), think
about a "post filter" that would only operate on documents that
had made it through the select, although how to convey which
groups the user selected to the post filter is an open
question.

Best,
Erick

On Wed, Oct 9, 2013 at 12:23 PM, David Philip
<davidphilipsheron@gmail.com> wrote:
> Hi All,
>
>     I have an issue in handling filters for one of our requirements and
> liked to get suggestion  for the best approaches.
>
>
> *Use Case:*
>
> 1.  We have List of groups and the number of groups can increase upto >1
> million. Currently we have almost 90 thousand groups in the solr search
> system.
>
> 2.  Just before the user hits a search, He has options to select the no. of
>  groups he want to retrieve. [the distinct list of these group Names for
> display are retrieved from other solr index that has more information about
> groups]
>
> *3.User Operation:** *
> Say if user selected group 1A  - group 10000A.  and searches for key:cancer.
>
>
> The current approach I was thinking is : get search results and filter
> query by groupids' list selected by user. But my concern is When these
> groups list is increasing to >50k unique Ids, This can cause lot of delay
> in getting search results. So wanted to know whether there are different
>  filtering ways that I can try for?
>
> I was thinking of one more approach as suggested by my colleague to do -
>  intersection.  -
> Get the groupIds' selected by user.
> Get the list of groupId's from search results,
> Perform intersection of both and then get the entire result set of only
> those groupid that intersected. Is this better way? Can I use any cache
> technique in this case?
>
>
> - David.

Mime
View raw message