lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Benchmarking results
Date Mon, 10 Apr 2006 05:11:26 GMT
Hello,

I have discovered a serious bug in the LuceneIndexer benchmarking  
app.  All tests have been rerun, and the new numbers reflect a 13-15%  
improvement for Lucene.  I apologize for having reported bad data.

Here are some of the new results, both with and without the bug so  
that you can see how the numbers were affected.  They were prepared  
using subversion repository 779.

RESULTS A: 'body' neither stored nor vectorized
======================================================================== 
===
configuration             truncated mean secs (6 reps)   max memory  
(1 rep)
------------------------------------------------------------------------ 
---
Lucene / JVM 1.4                  43.68                         79 MB
Lucene / JVM 1.5                  44.95                         93 MB
Lucene / JVM 1.4 with bug         49.63                         79 MB
Lucene / JVM 1.5 with bug         50.93                         92 MB

RESULTS B: 'body' stored and vectorized
======================================================================== 
===
configuration             truncated mean secs (6 reps)   max memory  
(1 rep)
------------------------------------------------------------------------ 
---
Lucene / JVM 1.4                  71.96                        118 MB
Lucene / JVM 1.5                  73.81                        214 MB
Lucene / JVM 1.4 with bug         84.73                        182 MB
Lucene / JVM 1.5 with bug         88.96                        199 MB

The bug was in buildFileList() and resulted in a bogus list of  
filepaths.  KinoSearch and Plucene were indexing 19043 documents once  
each.  Lucene was indexing 22 documents over and over, about 900  
times each.

   // Return a lexically sorted list of all article files from all  
subdirs.
   static String[] buildFileList () throws Exception {
     File[] articleDirs = corpusDir.listFiles();
     Vector filePaths = new Vector();
     for (int i = 0; i < articleDirs.length; i++) {
       File[] articles = articleDirs[i].listFiles();
       for (int j = 0; j < articles.length; j++) {
         String path = articles[i].getPath();   // <-- BUG: should be  
j, not i
         if (path.indexOf("article") == -1)
           continue;
         filePaths.add(path);
       }
     }
     Collections.sort(filePaths);
     return (String[])filePaths.toArray(new String[filePaths.size()]);
   }

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message