lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Benchmarking results
Date Tue, 04 Apr 2006 10:09:23 GMT
RESULTS A: 'body' neither stored nor vectorized
======================================================================== 
===
configuration               avg secs       max memory consumed
------------------------------------------------------------------------ 
---
Lucene / JVM 1.4               50.14               79 MB
Lucene / JVM 1.5               51.86               93 MB
KinoSearch / Perl 5.8.8        70.25               29 MB
KinoSearch / Perl 5.8.6        83.43               31 MB


RESULTS B: 'body' stored and vectorized
======================================================================== 
===
configuration               avg secs       max memory consumed
------------------------------------------------------------------------ 
---
KinoSearch / Perl 5.8.8        76.01               29 MB
Lucene / JVM 1.4               86.70              178 MB
KinoSearch / Perl 5.8.6        88.79               31 MB
Lucene / JVM 1.5               89.28              147 MB
Plucene / Perl 5.8.6         2014.00*            skipped


DISCUSSION
======================================================================== 
===

1) Lucene performs better than KinoSearch when there is less data to  
be stored, while KinoSearch does better when there is a lot of data  
to be stored.  This may be because Lucene rewrites the stored field  
data and the term vector data whenever segments are merged, while  
KinoSearch writes that data only once (twice if you count the fact  
that KinoSearch only supports the compound file format, which we've  
disabled in Lucene for the sake of speed).  It probably also helps  
that KinoSearch stores term vector data with the stored field data in  
the .fdx file.

2) The memory consumed by Lucene is due to the generous value (1000)  
assigned to maxBufferedDocs, which is critical for indexing  
performance.  KinoSearch's memory consumption is primarily dependent  
on the mem_threshold argument to the KinoSearch::Util::SortExternal  
constructor, which isn't accessible from the public API at present.   
Increasing this from the default of 16 MB to 256 MB improves speed by  
another 15% or so.

3) The difference between Perl 5.8.8 and 5.8.6 probably has less to  
do with the version number and more to do with the fact that the  
5.8.6 install has threads enabled, while the 5.8.8 install does not.   
The 5.8.6 install is the Perl that Apple ships with OS X 10.4.  The  
5.8.8 install is compiled from source using all the Configure  
script's suggestions/defaults except for the two pertaining to  
installation location.

4) While Plucene is written in pure Perl and KinoSearch is written in  
Perl and C/XS, there are also substantial algorithmic differences  
between them.  These have been covered in depth elsewhere.

METHODOLOGY
======================================================================== 
===

Source code for the experiment can be found at <http:// 
www.rectangular.com/svn/kinosearch/trunk/t/benchmarks/>. The tests  
were run using subversion repository revision 762.

The test corpus was Reuters-21578, Distribution 1.0.  Reuters-21578  
is available from David D. Lewis' professional home page, currently:

     http://www.research.att.com/~lewis

The times for KinoSearch and Lucene are 5-run averages.  OS X is a  
busy operating system, which injects some noise into the results.   
It's crucial that iters occur one right after another, as a second  
run immediately following another is often faster, but even a few  
seconds lag between them can slow the second run.  (Presumably this  
is due to cache reassignment.)  Therefore, the same command was  
issued on the command line 6 times, separated by semicolons.  The  
first iter was discarded, and the rest were averaged.

The maximum memory consumption was measured during auxiliary passes  
(i.e. not averaged in), using the crude method of eyeballing RPRVT in  
the output of top.

* The sole Plucene stat isn't an average, it's just one run, as there  
wasn't time to perform multiple runs.

HARDWARE
======================================================================== 
===

     PowerBook G4 17" 1.67 MHz
     Mac OS X 10.4.5
     1.5 GB ram
     Seagate 5400 rpm, 100 MB ATA HD


SOFTWARE
======================================================================== 
===

Lucene 1.9.1
KinoSearch 0.09_03
Plucene 1.24

JVM 1.4.2_09
JVM 1.5.0_02
Apple's Perl 5.8.6 (shipped with OS X 10.4)
Perl 5.8.8 from source


RAW DATA
======================================================================== 
===

slothbear:~/Desktop/ks/t/benchmarks marvin$ javac -d . indexers/ 
LuceneIndexer.java
slothbear:~/Desktop/ks/t/benchmarks marvin$ java -server -Xmx500M  
LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server - 
Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer; java - 
server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.99
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.42
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.08
Java Lucene 1.9.1 DOCS: 19043 SECS: 49.54
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.48
Java Lucene 1.9.1 DOCS: 19043 SECS: 50.18
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac15 -d . indexers/ 
LuceneIndexer.java Note: indexers/LuceneIndexer.java uses unchecked  
or unsafe operations.Note: Recompile with -Xlint:unchecked for details.
slothbear:~/Desktop/ks/t/benchmarks marvin$ java15 -server -Xmx500M  
LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -server - 
Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 - 
server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer;
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.26
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.91
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.19
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.80
Java Lucene 1.9.1 DOCS: 19043 SECS: 51.23
Java Lucene 1.9.1 DOCS: 19043 SECS: 52.19
slothbear:~/Desktop/ks/t/benchmarks marvin$ vim indexers/ 
LuceneIndexer.java
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac -d . indexers/ 
LuceneIndexer.java slothbear:~/Desktop/ks/t/benchmarks marvin$ java - 
server -Xmx500M LuceneIndexer; java -server -Xmx500M LuceneIndexer;  
java -server -Xmx500M LuceneIndexer; java -server -Xmx500M  
LuceneIndexer; java -server -Xmx500M LuceneIndexer; java -server - 
Xmx500M LuceneIndexer
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.50
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.42
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.29
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.74
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.11
Java Lucene 1.9.1 DOCS: 19043 SECS: 86.96
slothbear:~/Desktop/ks/t/benchmarks marvin$ javac15 -d . indexers/ 
LuceneIndexer.java Note: indexers/LuceneIndexer.java uses unchecked  
or unsafe operations.Note: Recompile with -Xlint:unchecked for details.
slothbear:~/Desktop/ks/t/benchmarks marvin$ java15 -server -Xmx500M  
LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 -server - 
Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer; java15 - 
server -Xmx500M LuceneIndexer; java15 -server -Xmx500M LuceneIndexer;
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.43
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.52
Java Lucene 1.9.1 DOCS: 19043 SECS: 90.06
Java Lucene 1.9.1 DOCS: 19043 SECS: 89.69
Java Lucene 1.9.1 DOCS: 19043 SECS: 87.87
Java Lucene 1.9.1 DOCS: 19043 SECS: 88.24
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl -Mblib indexers/ 
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;  
perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/ 
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;  
perl -Mblib indexers/kinosearch_indexer.plx;
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.20
KinoSearch 0.09_03 DOCS: 19043  SECS: 82.55
KinoSearch 0.09_03 DOCS: 19043  SECS: 82.38
KinoSearch 0.09_03 DOCS: 19043  SECS: 81.86
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.79
KinoSearch 0.09_03 DOCS: 19043  SECS: 82.52
slothbear:~/Desktop/ks/t/benchmarks marvin$ vim indexers/ 
kinosearch_indexer.plx
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl -Mblib indexers/ 
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;  
perl -Mblib indexers/kinosearch_indexer.plx; perl -Mblib indexers/ 
kinosearch_indexer.plx; perl -Mblib indexers/kinosearch_indexer.plx;  
perl -Mblib indexers/kinosearch_indexer.plx;
KinoSearch 0.09_03 DOCS: 19043  SECS: 88.16
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.70
KinoSearch 0.09_03 DOCS: 19043  SECS: 92.67
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.32
KinoSearch 0.09_03 DOCS: 19043  SECS: 88.35
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.92
slothbear:~/Desktop/ks/t/benchmarks marvin$ cd ~/Desktop/ks588/t/ 
benchmarks/
slothbear:~/Desktop/ks588/t/benchmarks marvin$ /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx
KinoSearch 0.09_03 DOCS: 19043  SECS: 69.67
KinoSearch 0.09_03 DOCS: 19043  SECS: 70.44
KinoSearch 0.09_03 DOCS: 19043  SECS: 72.87
KinoSearch 0.09_03 DOCS: 19043  SECS: 69.94
KinoSearch 0.09_03 DOCS: 19043  SECS: 69.16
KinoSearch 0.09_03 DOCS: 19043  SECS: 68.82
slothbear:~/Desktop/ks588/t/benchmarks marvin$ vim indexers/ 
kinosearch_indexer.plx
slothbear:~/Desktop/ks588/t/benchmarks marvin$ /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx; /usr/local/perl588/bin/ 
perl -Mblib indexers/kinosearch_indexer.plx
KinoSearch 0.09_03 DOCS: 19043  SECS: 87.58
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.17
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.86
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.05
KinoSearch 0.09_03 DOCS: 19043  SECS: 78.55
KinoSearch 0.09_03 DOCS: 19043  SECS: 75.41
slothbear:~/Desktop/ks588/t/benchmarks marvin$ cd ~/Desktop/ks/t/ 
benchmarks/
slothbear:~/Desktop/ks/t/benchmarks marvin$ perl indexers/ 
plucene_indexer.plx; perl indexers/plucene_indexer.plx; perl indexers/ 
plucene_indexer.plx; perl indexers/plucene_indexer.plx; perl indexers/ 
plucene_indexer.plx;
Plucene 1.24 DOCS: 19043  SECS: 2013.70
^C
Couldn't get lock at indexers/plucene_indexer.plx line 56
^C
^C
slothbear:~/Desktop/ks/t/benchmarks marvin$




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message