lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: is multi-threads searcher feasible idea to speed up?
Date Tue, 28 Sep 2010 12:17:11 GMT
This is an excellent idea!

And, desperately needed.

It's high time Lucene can take advantage of concurrency when running a
single query.  Machines have tons of cores these days!  (My dev box
has 24!).

Note that one simple way to do this is use ParallelMultiSearcher: it
uses one thread per segment in your index.

But, note that [perversely] this means if your index is optimized you
get no concurrency gain!  So, you have to create your index w/ a
carefully picked maxMergeDocs/MB to ensure you can use concurrency.

I don't like having concurrency tied to index structure.  So a better
approach would be to have each thread pull its own Scorer for the same
query, but then each one does a .advance to it's "chunk" of the index,
and then iterates from there.  Then merge PQs in the end just like
MultiSearcher.

Mike

On Tue, Sep 28, 2010 at 7:24 AM, Li Li <fancyerii@gmail.com> wrote:
> hi all
>    I want to speed up search time for my application. In a query, the
> time is largly used in reading postlist(io with frq files) and
> calculate scores and collect result(cpu, with Priority Queue). IO is
> hardly optimized or already part optimized by nio. So I want to use
> multithreads to utilize cpu. of course, it may be decrease QPS, but
> the response time will also decrease-- that what I want. Because cpu
> is easily obtained compared to faster hard disk.
>    I read the codes of searching roughly and find it's not an easy
> task to modify search process. So I want to use other easy method .
>    One is use solr distributed search and dispatch documents to many
> shards. but due to the network and global idf problem,it seems not a
> good method for me.
>    Another one is to modify the index structure and averagely
> dispatch frq files.
>    e.g    term1 -> doc1,doc2, doc3,doc4,doc5 in _1.frq
>    I create to 2 indexes with
>            term1->doc1,doc3,doc5
>            term1->doc2,doc4
>    when searching, I create 2 threads with 2 PriorityQueues to
> collect top N docs and merging their results
>    Is the 2nd idea feasible? Or any one has related idea? thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message