lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <>
Subject Performance benchmarks
Date Fri, 03 May 2002 13:50:57 GMT
Some performance numbers

Java Version: 1.3_01
OS Version: Windows 2000
CPU (Type, Speed and Quantity): Pentium 4, 1.5 GHz, 1 CPU
RAM: 512 MB
Drive configuration (IDE, SCSI, RAID-1, RAID-5): IDE (single)
Number of source documents: 103009
Total filesize of source documents: 430MB
Average filesize of source documents (in KB/MB): 4.3KB
Source documents storage location (filesystem, DB, http,etc): Filesystem
File type of source documents: xml
Parser(s) used, if any: Standard Analyzer
Number of Fields per document: 8
Time taken (in ms/s as an average of at least 3 indexing runs): 8387 sec
(139 min)
Time taken / 1000 docs indexed: 81 sec / 1000 docs
Notes (any special tuning/strategies):
I convert each document to a DOM, and use xpath to get the fields.
I perform validation on the data and make sure that it meets certain
criteria like total size > 150 characters, and verify there are no
duplicates using a Hashmap. Without these checks, the indexing goes faster
(about 60 seconds/1000 docs).

I hope this is helpful.

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message