lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott <m.scott.ti...@gmail.com>
Subject Re: Searching documents on big index by using ParallelMultiSearcher is slow...
Date Wed, 04 Oct 2006 03:35:17 GMT
Hi,

 > Well, the first question is always "are you opening/closing your
 > IndexSearchers for each request on your remote machines?". This is 
always a
 > no-no. This is also a question for your single-searcher version.

Yes I know, each search slave (RMI server) have single instance
  of IndexSearcher and it's open once when RMI server starts.

 > What is your performance if you only go to one server? I'd start by 
finding

A performance on one server with FULL index (not divided by 10)
  is about 2500 ms.
On one server with splitted index (divided by 10) is about 50 ms.

And on ParallelMultiSearcher with 10 of remote searchable,
  each RemoteSearchable returns in about 50 - 100 ms,
  and ParallelMultiSearcher returns also 50 - 100 ms, because of
  threading.
but Hits Searcher.search(Query, Sort) responds in about 500 - 1000 ms.

I think that Searcher.search with Sort reads all of SortFields from
  IndexReader and it's bottleneck.

Are there results of high performance distributed Lucene with 
ParallelMultiSearcher?
Or need hadoop?

Erick Erickson wrote:
> Well, the first question is always "are you opening/closing your
> IndexSearchers for each request on your remote machines?". This is always a
> no-no. This is also a question for your single-searcher version.
> 
> What is your performance if you only go to one server? I'd start by finding
> out what happens when you forget all the ParallelMultiSearcher stuff, all
> the RMI stuff etc, and just see what your performance is on one of your
> index parts locally. Once that is answered, extend to RMI, then the
> Parallel...., at each step seeing if your performance degrades 
> unacceptably.
> That'll at least give you a clue what part of the process is the biggest
> problem.
> 
> And without knowing a LOT more about your searches, and your index, it's
> kind of hard to come up with solutions <G>....
> 
> Best
> Erick
> 
> On 10/3/06, Scott <m.scott.tiger@gmail.com> wrote:
>>
>> Hi,
>>
>> I have a question about ParallelMultiSearcher performance.
>>
>> I want to search documents on about 10 gigabytes of index.
>> (The index has 10,000,000 documents.)
>>
>> I get very slow performance using IndexSearcher with ONE index normally.
>> Then I tried to use ParallelMultiSearcher with 10 servers of remote
>> searchable.
>>
>> Index:
>> Each search slaves have 1/10 of index.
>> (ONE index divided to 10 servers.)
>>
>> Search slave:
>> Each search slaves start remote searchable RMI server,
>> and wait connecting from search master.
>>
>> Search master:
>> The search master use Naming.lookup() to get remote searchable.
>> Get 10 remote searchables from each search slaves and build
>> ParallelMultiSearcher.
>> Then search.
>>
>> Any solution?
>>
>> -- 
>> Scott
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 

-- 
Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message