lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atul Bisaria <atul.bisa...@ericsson.com>
Subject RE: Increase search performance
Date Thu, 01 Feb 2018 11:31:16 GMT
Hi Adrien,

Thanks for your reply.

I have also tried testing with UsageTrackingQueryCachingPolicy, but did not observe a significant
change in both latency and throughput.

Given that I have specific search requirements of no scoring and sorting the search results
in a random order (reason for custom sort object), I have also explored writing a custom collector
and could observe quite a difference in latency figures.

Let me know if this custom collector code has any loopholes which I could be missing:

class RandomOrderCollector extends SimpleCollector
{
        private int maxHitsRequired;
        private int docBase;

        private List<Integer> matches = new ArrayList<Integer>();

        public RandomOrderCollector(int maxHitsRequired)
        {
                this.maxHitsRequired = maxHitsRequired;
        }

        public boolean needsScores()
        {
                return false;
        }

        @Override
        public void collect(int doc) throws IOException
        {
                matches.add(docBase + doc);
        }

        @Override
        protected void doSetNextReader(LeafReaderContext context) throws IOException
        {
                super.doSetNextReader(context);
                this.docBase = context.docBase;
        }

        public List<Integer> getHits()
        {
                Collections.shuffle(matches);
                maxHitsRequired = Math.min(matches.size(), maxHitsRequired);

                return matches.subList(0, maxHitsRequired);
        }
}

Best Regards,
Atul Bisaria

-----Original Message-----
From: Adrien Grand [mailto:jpountz@gmail.com]
Sent: Wednesday, January 31, 2018 6:33 PM
To: java-user@lucene.apache.org
Subject: Re: Increase search performance

Hi Atul,


Le mar. 30 janv. 2018 à 16:24, Atul Bisaria <atul.bisaria@ericsson.com> a écrit :

> 1.     Using ConstantScoreQuery so that scoring overhead is removed since
> scoring is not required in my search use case. I also use a custom
> Sort object which does not sort by score (see code below).
>

If you don't sort by score, then wrapping with a ConstantScoreQuery won't help as Lucene will
figure out scores are not needed anyway.


> 2.     Using query cache
>
>
>
> My understanding is that query cache would cache query results and
> hence lead to significant increase in performance. Is this understanding correct?
>

It depends what you mean by performance. If you are optimizing for worst-case latency, then
the query cache might make things worse due to the fact that caching a query requires to visit
all matches, while query execution can sometimes just skip over non-interesting matches (eg.
in conjunctions).

However if you are looking at improving throughput, then usually the default policy of the
query cache of caching queries that look reused usually helps.


> I am using Lucene version 5.4.1 where query cache seems to be enabled
> by default (https://issues.apache.org/jira/browse/LUCENE-6784), but I
> am not able to see any significant change in search performance.
>




> Here is the code I am testing with:
>
>
>
> DirectoryReader reader = DirectoryReader.open(directory);      //using
> MMapDirectory
>
> IndexSearcher searcher = new IndexSearcher(reader); //IndexReader and
> IndexSearcher are created only once
>
> searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE);
>

Don't do that, this will always cache all filters, which usually makes things slower for the
reason mentioned above. I would rather advise that you use an instance of UsageTrackingQueryCachingPolicy.
Mime
View raw message