lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: Performance benchmarks
Date Sat, 04 May 2002 09:02:12 GMT
Great Peter. I've posted a new set of attributes based on your submission
and Otis' feedback. Let me think about the best way to consolidate these
numbers and stick them somewhere accessible for all.

----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Friday, May 03, 2002 9:50 PM
Subject: Performance benchmarks


> Some performance numbers
>
> Java Version: 1.3_01
> OS Version: Windows 2000
> CPU (Type, Speed and Quantity): Pentium 4, 1.5 GHz, 1 CPU
> RAM: 512 MB
> Drive configuration (IDE, SCSI, RAID-1, RAID-5): IDE (single)
> Number of source documents: 103009
> Total filesize of source documents: 430MB
> Average filesize of source documents (in KB/MB): 4.3KB
> Source documents storage location (filesystem, DB, http,etc): Filesystem
> File type of source documents: xml
> Parser(s) used, if any: Standard Analyzer
> Number of Fields per document: 8
> Time taken (in ms/s as an average of at least 3 indexing runs): 8387 sec
> (139 min)
> Time taken / 1000 docs indexed: 81 sec / 1000 docs
> Notes (any special tuning/strategies):
> I convert each document to a DOM, and use xpath to get the fields.
> I perform validation on the data and make sure that it meets certain
> criteria like total size > 150 characters, and verify there are no
> duplicates using a Hashmap. Without these checks, the indexing goes faster
> (about 60 seconds/1000 docs).
>
>
> I hope this is helpful.
> --Peter
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message