lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ganesh M <ganesh.sudhakar....@gmail.com>
Subject Re: fq performance
Date Fri, 17 Mar 2017 06:46:53 GMT
Hi Shawn / Michael,

Thanks for your replies and I guess you have got my scenarios exactly right.

Initially my document contains information about who have access to the
documents, like field as (U1_s:true). if 100 users can access a document,
we will have 100 such fields for each user.
So when U1 wants to see all this documents..i will query like get all
documents where U1_s:true.

If user U5 added to group G1, then I have to take all the documents of
group G1 and have to set the information of user U5 in the document like
U5_s:true in the document. For this, I have re-index all the documents in
that group.

To avoid this, I was trying to keep group information instead of user
information like G1_s:true, G2_s:true in the document. And for querying
user documents, I will first get all the groups of User U1, and then query
get all documents where G1_s:true OR G2_s:true or G3_s:true....  By this we
don't need to re-index all the documents. But while querying I need to
query with OR of all the groups user belongs to.

For how many ORs solr can give the results in less than one second.Can I
pass 100's of OR condtion in the solr query? will that affects the
performance ?

Pls share your valuable inputs.

On Thu, Mar 16, 2017 at 6:04 PM Shawn Heisey <apache@elyograg.org> wrote:

> On 3/16/2017 6:02 AM, Ganesh M wrote:
> > We have 1 million of documents and would like to query with multiple fq
> values.
> >
> > We have kept the access_control ( multi value field ) which holds
> information about for which group that document is accessible.
> >
> > Now to get the list of all the documents of an user, we would like to
> pass multiple fq values ( one for each group user belongs to )
> >
> >
> q:somefiled:value&fq:access_control:g1&fq:access_control:g2&fq:access_control:g3&fq:access_control:g4&fq:access_control:g5...
> >
> > Like this, there could be 100 groups for an user.
>
> The correct syntax is fq=field:value -- what you have there is not going
> to work.
>
> This might not do what you expect.  Filter queries are ANDed together --
> *every* filter must match, which means that if a document that you want
> has only one of those values in access_control, or has 98 of them but
> not all 100, then the query isn't going to match that document.  The
> solution is one filter query that can match ANY of them, which also
> might run faster.  I can't say whether this is a problem for you or
> not.  Your data might be completely correct for matching 100 filters.
>
> Also keep in mind that there is a limit to the size of a URL that you
> can send into any webserver, including the container that runs Solr.
> That default limit is 8192 bytes, and includes the "GET " or "POST " at
> the beginning and the " HTTP/1.1" at the end (note the spaces).  The
> filter query information for 100 of the filters you mentioned is going
> to be over 2K, which will fit in the default, but if your query has more
> complexity than you have mentioned here, the total URL might not fit.
> There's a workaround to this -- use a POST request and put the
> parameters in the request body.
>
> > If we fire query with 100 values in the fq, whats the penalty on the
> performance ? Can we get the result in less than one second for 1 million
> of documents.
>
> With one million documents, each internal filter query result is 250000
> bytes -- the number of documents divided by eight.  That's 2.5 megabytes
> for 100 of them.  In addition, every time a filter is run, it must
> examine every document in the index to create that 250000 byte
> structure, which means that filters which *aren't* found in the
> filterCache are relatively slow.  If they are found in the cache,
> they're lightning fast, because the cache will contain the entire 250000
> byte bitset.
>
> If you make your filterCache large enough, it's going to consume a LOT
> of java heap memory, particularly if the index gets bigger.  The nice
> thing about the filterCache is that once the cache entries exist, the
> filters are REALLY fast, and if they're all cached, you would DEFINITELY
> be able to get results in under one second.  I have no idea whether the
> same would happen when filters aren't cached.  It might.  Filters that
> do not exist in the cache will be executed in parallel, so the number of
> CPUs that you have in the machine, along with the query rate, will have
> a big impact on the overall performance of a single query with a lot of
> filters.
>
> Also related to the filterCache, keep in mind that every time a commit
> is made that opens a new searcher, the filterCache will be autowarmed.
> If the autowarmCount value for the filterCache is large, that can make
> commits take a very long time, which will cause problems if commits are
> happening frequently.  On the other hand, a very small autowarmCount can
> cause slow performance after a commit if you use a lot of filters.
>
> My reply is longer and more dense than I had anticipated.  Apologies if
> it's information overload.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message