lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Tong <st...@jamasoftware.com>
Subject Is indexing much slower in 3.5.0 than in 2.4.1 for Wikipedia data?
Date Mon, 12 Dec 2011 07:54:21 GMT
Hi,

We plan to upgrade the Lucene library in our application from 2.4.1 to 3.5.0. I have been
running  benchmark tests that come with Lucence. To my surprise, I found that the indexing
 in 3.5.0 is significant slower than 2.4.1 for the Wikipedia data.

Attached is the algorithm for the tests.   The tests used default Lucence settings for flush
memory size and merge factor. 512M memory was used  for the tasks.  The test machine is a
64-bit Windows 7 machine with Intel Core i7.

The command:
%ant -Dtask.alg=conf/wikipedia-default.alg -Dtask.mem=512M run-task

Here are the test results:

Lucece 2.4.1

       [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 out of
14)

     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s  elapsedSec
   avgUsedMem    avgTotalMem

     [java] MAddDocs_200000     0 16.00  10        1       200000      1,609.1      124.29
   89,218,496    241,631,232

     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  - 1,746.4 -  - 114.52
- 102,365,864 -  241,762,304

     [java] MAddDocs_200000     2 16.00  10        1       200000      1,566.8      127.65
   69,428,144    174,194,688

Lucene 2.9.4

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 out of 14)

     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s  elapsedSec
   avgUsedMem    avgTotalMem

     [java] MAddDocs_200000     0 16.00  10        1       200000     1,046.49      191.12
   82,676,152    139,657,216

     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -   1,165.35 -  - 171.62
- 119,364,128 -  156,762,112

     [java] MAddDocs_200000     2 16.00  10        1       200000     1,245.86      160.53
   50,361,760    137,625,600

Lucene 3.5.0

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3 out of 14)

     [java] Operation       round flush mrg   runCnt   recsPerRun        rec/s  elapsedSec
   avgUsedMem    avgTotalMem

     [java] MAddDocs_200000     0 16.00  10        1       200000       676.48      295.65
   70,917,592    129,695,744

     [java] MAddDocs_200000 -   1 16.00  10 -  -   1 -  -  200000 -  -  626.13 -  - 319.42
-  50,329,552 -   94,240,768

     [java] MAddDocs_200000     2 16.00  10        1       200000       687.68      290.83
   57,732,640     92,864,512


The indexing speed using 2.4.1 is 2.3x  of the speed using 3.5.0.   Did I miss any settings
or configurations?

Thanks,

Sean



Mime
View raw message