lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott <m.scott.ti...@gmail.com>
Subject Re: Searching documents on big index by using ParallelMultiSearcher is slow...
Date Wed, 04 Oct 2006 13:30:02 GMT
Indeed, I am using a bit complex Query (4 fields with OR).

My index has fields Title, Sub-title, Content, Author.
And search them by one query like as web search engine.

Thank you for details about weight.

So I need to avoid remote calls to rewrite() and docFreq().
I'll try to make Hits object remotely and SearchMaster collects
top N of Hit from Hits then SearchMaster sort it.

I tested ParallelMultiSearcher performance.
it makes and starts thread serially.
Then wait for all threads ended.
But it is threaded, so searching is parallelly on remote server,

I insert debug program that calc elapsed times.
into below methods.

ParallelMultiSearcher.java:
public TopDocs search(Weight weight, Filter filter, int nDocs);
public TopFieldDocs search(Weight weight, Filter filter, int nDocs, Sort 
sort);

Searcher.java:
public Hits search(Query query, Filter filter);
public Hits search(Query query, Sort sort);
public Hits search(Query query, Filter filter, Sort sort);

debug program is :
--------
long startTime = System.currentTimeMillis();
System.out.println("Start ClassNameHere search");

... original main routine ...

long endTime = System.currentTimeMillis();
float totalTime = (endTime - startTime) / 1000.0f;
System.out.println("End ClassNameHere search in " + totalTime + "ms");
--------

Then, result is below.

Start Searcher search
Start ParallelMultiSearcher search
End ParallelMultiSearcher search in 0.049ms
End Searcher search in 0.449ms

I thinks the time 0.449 - 0.049 = '0.400' is weight calculation,
need to reduce this by trick...

Haines, Ronald C. (LNG-DAY) wrote:
> Keep in mind, that depending on your queries (lots of terms, wildcards,
> date ranges), you can spend quite a bit of time during the 'Weight'
> calculation...this all happens pre-search.  During the Weight
> calculation, you will be making remote calls to the rewrite() and
> docFreq() methods.  There will be (# of terms * # of remotes) of these
> remote calls made for each of the above methods.
> 
> And, I think the ParallelMultiSearcher will make all of these calls
> serially before it starts to thread the search process.  I have found
> that this, serially, can account for quite a bit of the overall response
> time.
> 
> I too am interested in learning more about a large scale distributed
> Lucene model.
>  
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com] 
> Sent: Wednesday, October 04, 2006 7:33 AM
> To: java-user@lucene.apache.org
> Subject: Re: Searching documents on big index by using
> ParallelMultiSearcher is slow...
> 
> OK, you're now officially beyond my competence, so I'll have to wait for
> people who actually know <G>....
> 
> Although if I read your stats right, you're getting approximately 1 sec
> response time over 10M documents on a 10G index. That's not bad at all.
> What
> kind of response time do you need?
> 
> On 10/3/06, Scott <m.scott.tiger@gmail.com> wrote:
>> Hi,
>>
>>> Well, the first question is always "are you opening/closing your
>>> IndexSearchers for each request on your remote machines?". This is
>> always a
>>> no-no. This is also a question for your single-searcher version.
>> Yes I know, each search slave (RMI server) have single instance
>>   of IndexSearcher and it's open once when RMI server starts.
>>
>>> What is your performance if you only go to one server? I'd start by
>> finding
>>
>> A performance on one server with FULL index (not divided by 10)
>>   is about 2500 ms.
>> On one server with splitted index (divided by 10) is about 50 ms.
>>
>> And on ParallelMultiSearcher with 10 of remote searchable,
>>   each RemoteSearchable returns in about 50 - 100 ms,
>>   and ParallelMultiSearcher returns also 50 - 100 ms, because of
>>   threading.
>> but Hits Searcher.search(Query, Sort) responds in about 500 - 1000 ms.
>>
>> I think that Searcher.search with Sort reads all of SortFields from
>>   IndexReader and it's bottleneck.
>>
>> Are there results of high performance distributed Lucene with
>> ParallelMultiSearcher?
>> Or need hadoop?
>>
>> Erick Erickson wrote:
>>> Well, the first question is always "are you opening/closing your
>>> IndexSearchers for each request on your remote machines?". This is
>> always a
>>> no-no. This is also a question for your single-searcher version.
>>>
>>> What is your performance if you only go to one server? I'd start by
>> finding
>>> out what happens when you forget all the ParallelMultiSearcher
> stuff,
>> all
>>> the RMI stuff etc, and just see what your performance is on one of
> your
>>> index parts locally. Once that is answered, extend to RMI, then the
>>> Parallel...., at each step seeing if your performance degrades
>>> unacceptably.
>>> That'll at least give you a clue what part of the process is the
> biggest
>>> problem.
>>>
>>> And without knowing a LOT more about your searches, and your index,
> it's
>>> kind of hard to come up with solutions <G>....
>>>
>>> Best
>>> Erick
>>>
>>> On 10/3/06, Scott <m.scott.tiger@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I have a question about ParallelMultiSearcher performance.
>>>>
>>>> I want to search documents on about 10 gigabytes of index.
>>>> (The index has 10,000,000 documents.)
>>>>
>>>> I get very slow performance using IndexSearcher with ONE index
>> normally.
>>>> Then I tried to use ParallelMultiSearcher with 10 servers of remote
>>>> searchable.
>>>>
>>>> Index:
>>>> Each search slaves have 1/10 of index.
>>>> (ONE index divided to 10 servers.)
>>>>
>>>> Search slave:
>>>> Each search slaves start remote searchable RMI server,
>>>> and wait connecting from search master.
>>>>
>>>> Search master:
>>>> The search master use Naming.lookup() to get remote searchable.
>>>> Get 10 remote searchables from each search slaves and build
>>>> ParallelMultiSearcher.
>>>> Then search.
>>>>
>>>> Any solution?
>>>>
>>>> --
>>>> Scott
>>>>
>>>>
> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>> --
>> Scott
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

-- 
Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message