lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hui" <...@triplehop.com>
Subject RE: Sys properties Was: java.io.tmpdir as lock dir .... once again
Date Mon, 08 Mar 2004 15:34:59 GMT




Hi,

Here is the indexing performance testing result for the two index formats.


1000 megahertz Intel Pentium III (2 installed)
32 kilobyte primary memory cache
256 kilobyte secondary memory cache

SCSI Hard drive 145.45 GB  
RAm 3G

Windows 2000 Advanced Server, Service Pack 2

JDK 140
JVM memory 512m

Indexed files: local 66100 local text files around 400m

Index time: 
compound format is 89 seconds slower.

compound format:
1389507 total milliseconds
non-compound format:
1300534 total milliseconds

The index size is 85m with 4 fields only. The files are stored in the index.
The compound format has only 3 files and the other has 13 files. 

Search Time (with only top 10 retrieved, no indexing at the same time, only
one thread search, indices are optimized and opened)
Do not see too much constant difference for the simple situation.

compound format:
Query: iraq
4275 total within(ms) 110
Query: war
5728 total within(ms) 0
Query: iraq AND war
3182 total within(ms) 16

non-compound format:
Query: war
5728 total within(ms) 125
Query: iraq war
6821 total within(ms) 31
Query: iraq AND war
3182 total within(ms) 0



-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Thursday, March 04, 2004 11:54 AM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

hui wrote:
> Not yet. For the compound file format, when the files get bigger, if I add
> few new files frequently, the bigger files has to be updated. Will that
> affect lot on the search and produce heavier disk I/O compared with the
> traditional index format? It seems OS cache makes quite difference when
the
> files not changed differently.

The compound format slows indexing performance slightly, but should not 
affect search performance much.  It radically reduces the number of file 
handles used when searching, by a factor of eight or more, depending on 
how many indexed fields you have.

Perhaps the compound format should be the default format in 1.4.  Can 
folks provide any benchmarks for how it affects performance?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message