lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Increase search performance
Date Fri, 02 Feb 2018 08:12:12 GMT
If needsScores returns false on the collector, then scores won't be
computed.

Your prototype should work well.

Le ven. 2 févr. 2018 à 04:46, Atul Bisaria <atul.bisaria@ericsson.com> a
écrit :

> Hi Adrien,
>
> Please correct if I am wrong, but I believe using extended IntComparator
> in custom Sort object for randomization would still score documents (using
> IndexSearcher.search(Query, int, Sort), for example).
>
> So I tried using a custom collector using IndexSearcher.search(Query,
> Collector) where the custom collector does not score documents at all.
>
> I have refactored RandomOrderCollector to fix the memory usage problem as
> described below. Let me know if this looks ok now.
>
> class RandomOrderCollector extends SimpleCollector
> {
>         private int maxHitsRequired;
>         private int docBase;
>
>         private ScoreDoc[] matches;
>
>         private int numHits;
>
>         private Random random = new Random();
>
>         public RandomOrderCollector(int maxHitsRequired)
>         {
>                 this.maxHitsRequired = maxHitsRequired;
>                 this.matches = new ScoreDoc[maxHitsRequired];
>         }
>
>         public boolean needsScores()
>         {
>                 return false;
>         }
>
>         @Override
>         public void collect(int doc) throws IOException
>         {
>                 int absoluteDoc = docBase + doc;
>                 int randomScore = random.nextInt(); // assign a random
> score to each doc
>
>                 if(numHits < maxHitsRequired)
>                 {
>                         matches[numHits++] = new ScoreDoc(absoluteDoc,
> randomScore);
>                 }
>                 else
>                 {
>                         int index = random.nextInt(maxHitsRequired);
>                         if(matches[index].score < randomScore)
>                         {
>                                 matches[index] = new ScoreDoc(absoluteDoc,
> randomScore);;
>                         }
>                 }
>         }
>
>         @Override
>         protected void doSetNextReader(LeafReaderContext context) throws
> IOException
>         {
>                 super.doSetNextReader(context);
>                 this.docBase = context.docBase;
>         }
>
>         public ScoreDoc[] getHits()
>         {
>                 return matches;
>         }
> }
>
> Best Regards,
> Atul Bisaria
>
> -----Original Message-----
> From: Adrien Grand [mailto:jpountz@gmail.com]
> Sent: Thursday, February 01, 2018 6:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: Increase search performance
>
> Yes, this collector won't perform well if you have many matches since
> memory usage is linear with the number of matches. A better option would be
> to extend eg. IntComparator and implement getNumericDocValues by returning
> a fake NumericDocValues instance that eg. does a bit mix of the doc id and
> a per-request seed (for instance HPPC's BitMixer can do that
> https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java
> ).
>
> Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisaria@ericsson.com> a
> écrit :
>
> > Hi Adrien,
> >
> > Thanks for your reply.
> >
> > I have also tried testing with UsageTrackingQueryCachingPolicy, but
> > did not observe a significant change in both latency and throughput.
> >
> > Given that I have specific search requirements of no scoring and
> > sorting the search results in a random order (reason for custom sort
> > object), I have also explored writing a custom collector and could
> > observe quite a difference in latency figures.
> >
> > Let me know if this custom collector code has any loopholes which I
> > could be missing:
> >
> > class RandomOrderCollector extends SimpleCollector {
> >         private int maxHitsRequired;
> >         private int docBase;
> >
> >         private List<Integer> matches = new ArrayList<Integer>();
> >
> >         public RandomOrderCollector(int maxHitsRequired)
> >         {
> >                 this.maxHitsRequired = maxHitsRequired;
> >         }
> >
> >         public boolean needsScores()
> >         {
> >                 return false;
> >         }
> >
> >         @Override
> >         public void collect(int doc) throws IOException
> >         {
> >                 matches.add(docBase + doc);
> >         }
> >
> >         @Override
> >         protected void doSetNextReader(LeafReaderContext context)
> > throws IOException
> >         {
> >                 super.doSetNextReader(context);
> >                 this.docBase = context.docBase;
> >         }
> >
> >         public List<Integer> getHits()
> >         {
> >                 Collections.shuffle(matches);
> >                 maxHitsRequired = Math.min(matches.size(),
> > maxHitsRequired);
> >
> >                 return matches.subList(0, maxHitsRequired);
> >         }
> > }
> >
> > Best Regards,
> > Atul Bisaria
> >
> > -----Original Message-----
> > From: Adrien Grand [mailto:jpountz@gmail.com]
> > Sent: Wednesday, January 31, 2018 6:33 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Increase search performance
> >
> > Hi Atul,
> >
> >
> > Le mar. 30 janv. 2018 à 16:24, Atul Bisaria
> > <atul.bisaria@ericsson.com> a écrit :
> >
> > > 1.     Using ConstantScoreQuery so that scoring overhead is removed
> since
> > > scoring is not required in my search use case. I also use a custom
> > > Sort object which does not sort by score (see code below).
> > >
> >
> > If you don't sort by score, then wrapping with a ConstantScoreQuery
> > won't help as Lucene will figure out scores are not needed anyway.
> >
> >
> > > 2.     Using query cache
> > >
> > >
> > >
> > > My understanding is that query cache would cache query results and
> > > hence lead to significant increase in performance. Is this
> > > understanding
> > correct?
> > >
> >
> > It depends what you mean by performance. If you are optimizing for
> > worst-case latency, then the query cache might make things worse due
> > to the fact that caching a query requires to visit all matches, while
> > query execution can sometimes just skip over non-interesting matches
> > (eg. in conjunctions).
> >
> > However if you are looking at improving throughput, then usually the
> > default policy of the query cache of caching queries that look reused
> > usually helps.
> >
> >
> > > I am using Lucene version 5.4.1 where query cache seems to be
> > > enabled by default
> > > (https://issues.apache.org/jira/browse/LUCENE-6784), but I am not
> able to see any significant change in search performance.
> > >
> >
> >
> >
> >
> > > Here is the code I am testing with:
> > >
> > >
> > >
> > > DirectoryReader reader = DirectoryReader.open(directory);      //using
> > > MMapDirectory
> > >
> > > IndexSearcher searcher = new IndexSearcher(reader); //IndexReader
> > > and IndexSearcher are created only once
> > >
> > > searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE);
> > >
> >
> > Don't do that, this will always cache all filters, which usually makes
> > things slower for the reason mentioned above. I would rather advise
> > that you use an instance of UsageTrackingQueryCachingPolicy.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message