lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Philip <davidphilipshe...@gmail.com>
Subject Re: Solr's Filtering approaches
Date Sat, 12 Oct 2013 05:57:09 GMT
Groups are pharmaceutical research expts.. User is presented with graph
view, he can select some region and all the groups in that region gets
included..user can modify the groups also here.. so we didn't maintain
group information in same solr index but we have externalized.
I looked at post filter article. So my understanding is that, I simply have
to extended as you did and should include implementaton for
"isAllowed(acls[doc], groups)" .This will filter the documents in the
collector and finally this collector will be returned. am I right?

  @Override
      public void collect(int doc) throws IOException {
        if (isAllowed(acls[doc], user, groups)) super.collect(doc);
      }


Erick, I am interested to know whether I can extend any class that can
return me only the bitset of the documents that match the search query. I
can then do bitset1.andbitset2OfGroups - finally, collect only those
documents to return to user. How do I try this approach? Any pointers for
bit set?

Thanks - David




On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Well, my first question is why 50K groups is necessary, and
> whether you can simplify that. How a user can manually
> choose from among that many groups is "interesting". But
> assuming they're all necessary, I can think of two things.
>
> If the user can only select ranges, just put in filter queries
> using ranges. Or possibly both ranges and individual entries,
> as fq=group:[1A TO 10000A] OR group:(2B 45C 98Z) etc.
> You need to be a little careful how you put index these so
> range queries work properly, in the above you'd miss
> 2A because it's sorting lexicographically, you'd need to
> store in some form that sorts like 0000001A 010000A
> and so on. You wouldn't need to show that form to the
> user, just form your fq's in the app to work with
> that form.
>
> If that won't work (you wouldn't want this to get huge), think
> about a "post filter" that would only operate on documents that
> had made it through the select, although how to convey which
> groups the user selected to the post filter is an open
> question.
>
> Best,
> Erick
>
> On Wed, Oct 9, 2013 at 12:23 PM, David Philip
> <davidphilipsheron@gmail.com> wrote:
> > Hi All,
> >
> >     I have an issue in handling filters for one of our requirements and
> > liked to get suggestion  for the best approaches.
> >
> >
> > *Use Case:*
> >
> > 1.  We have List of groups and the number of groups can increase upto >1
> > million. Currently we have almost 90 thousand groups in the solr search
> > system.
> >
> > 2.  Just before the user hits a search, He has options to select the no.
> of
> >  groups he want to retrieve. [the distinct list of these group Names for
> > display are retrieved from other solr index that has more information
> about
> > groups]
> >
> > *3.User Operation:** *
> > Say if user selected group 1A  - group 10000A.  and searches for
> key:cancer.
> >
> >
> > The current approach I was thinking is : get search results and filter
> > query by groupids' list selected by user. But my concern is When these
> > groups list is increasing to >50k unique Ids, This can cause lot of delay
> > in getting search results. So wanted to know whether there are different
> >  filtering ways that I can try for?
> >
> > I was thinking of one more approach as suggested by my colleague to do -
> >  intersection.  -
> > Get the groupIds' selected by user.
> > Get the list of groupId's from search results,
> > Perform intersection of both and then get the entire result set of only
> > those groupid that intersected. Is this better way? Can I use any cache
> > technique in this case?
> >
> >
> > - David.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message