lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Executing Collector's Collect method on more than one thread
Date Sun, 31 Jan 2016 18:08:52 GMT
Before thinking at all about threads you might try to speeding things up
with your implementation. In particular your call to the top level
docValues is going to be very slow. The way to speed this up is to switch
to the segment level doc value at each segment switch. That way you avoid
the rather large overhead involved with top level String docValues. Then I
would change your scorer to work directly with BytesRef rather then
converting to the utf8 String.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Jan 31, 2016 at 9:13 AM, adfel70 <adfel70@gmail.com> wrote:

> I am using RankQuery to implement my applicative scorer that returns a
> score
> based on the value of specific field (lets call it 'score_field') that is
> stored for every document.
> The RankQuery creates a collector, and for every collected docId I retrieve
> the value of score_field, calculate the score and add the doc id into
> priority queue:
>
> public class MyScorerrankQuery extends RankQuery {
>         ...
>
>         @Override
>         public TopDocsCollector getTopDocsCollector(int i,
> SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
>                 ...
>                 return new MyCollector(...)
>         }
> }
>
> public class MyCollector  extends TopDocsCollector{
>         MyScorer scorer;
>         SortedDocValues scoreFieldValues;
>
>
>         @Override
>         public void collect(int id){
>                 int docID = docBase + id;
>                         //1. get specific field from the doc using
> DocValues and calculate score
> using my scorer
>                         String value =
> scoreFieldValues.get(docID).utf8ToString();
>                         scorer.calcScore(value);
>                         //2. add docId and score (ScoreDoc object) into
> PriorityQueue.
>         }
> }
>
> Problem is that the calcScore may take ~20 ms per call, so if query returns
> 100,000 docs, which is not unusual, query execution time will be become 16
> minutes. Is there a way to parallelize collector's logic, so more than one
> thread would call calcScore simultaneously?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Executing-Collector-s-Collect-method-on-more-than-one-thread-tp4254269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message