lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-893) Increase buffer sizes used during searching
Date Sat, 26 May 2007 07:23:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499290
] 

Michael Busch commented on LUCENE-893:
--------------------------------------

I ran some performance tests with the same setup I used for LUCENE-866:

- 1.2 GB index, optimized, compound format, documents from Wikipedia 
- 50,000 queries, each query has 3 AND terms, each term has a df>100,
  each query has one or more hits
- 2.8 GHz Xeon, 4 GB RAM, SCSI HD, Windows Server 2003

My tests simply executes all 50k queries in a row and measures the 
overall time. I used the current trunk version patched with LUCENE-888
and LUCENE-866 and varied the buffer size of the cfs reader. 
Here are the results:  
 
 1 KB: Time: 51703 ms.
 2 KB: Time: 50672 ms.
 4 KB: Time: 50969 ms.
 8 KB: Time: 57047 ms.
16 KB: Time: 64547 ms.

I seems that it doesn't really matter if the buffer size is 1 KB, 2 KB,
or 4 KB. Above 4 KB the performance decreases significantly. 

Now the same test with a cfs reader buffer of 1 KB and varying buffer
sizes for the freq stream in SegmentTermDocs:

 1 KB: Time: 51875 ms.
 2 KB: Time: 46828 ms.
 4 KB: Time: 44500 ms.
 8 KB: Time: 50953 ms.
16 KB: Time: 64485 ms.

With 4 KB there is a performance improvement of 14%! But considering
the fact that this stream is cloned for every query term, I think
that 2 KB is the better choice, still a 10% improvement.

Now I simply vary the readBufferSize for all buffered inputs:

 1 KB: Time: 51778 ms.
 2 KB: Time: 46172 ms.
 4 KB: Time: 49000 ms.
 8 KB: Time: 52187 ms.
16 KB: Time: 69562 ms.

Now the same test with 50k disjunction queries, 3 terms per query:

1 KB: Time: 288422 ms.
2 KB: Time: 259672 ms.
4 KB: Time: 279563 ms.

2 KB for all input buffers seems to be a good compromise. It's about
10% faster than 1 KB for both types of queries. 

Question are:
- Can we afford the increased memory consumption?
- Is 2 KB also the best choice on other systems?


> Increase buffer sizes used during searching
> -------------------------------------------
>
>                 Key: LUCENE-893
>                 URL: https://issues.apache.org/jira/browse/LUCENE-893
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>
> Spinoff of LUCENE-888.
> In LUCENE-888 we increased buffer sizes that impact indexing and found
> substantial (10-18%) overall performance gains.
> It's very likely that we can also gain some performance for searching
> by increasing the read buffers in BufferedIndexInput used by
> searching.
> We need to test performance impact to verify and then pick a good
> overall default buffer size, also being careful not to add too much
> overall HEAP RAM usage because a potentially very large number of
> BufferedIndexInput instances are created during searching
> (# segments X # index files per segment).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message