lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <>
Subject Re: lucene 1.3 RC3 compiled with gcj
Date Thu, 04 Dec 2003 01:19:34 GMT

I ran more significant benchmarks today but not without problems.

I tried to index the JDK api doc HTML files under api/java/util using
IndexHTML. The problem with this test is that it uses threads which is
risky with gcj. (and IndexHTML creates lots of threads instead of reusing
them from a pre-allocated pool).

I then tried to index the JDK api doc HTML files under api/java/util using
IndexFiles. The advantage is that IndexFiles is single threaded. I
modified IndexFiles to only pickup .html files (a hack).

There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
6108kb of HTML text.

The first test: org.apache.lucene.demo.IndexHTML compiled with gcj -O2:

   - on win2k/cygwin (downloaded yesterday), the test crashes with 'too many
     threads' and dumps core. (gcc/gcj 3.3.1).

   - on linux (redhat 9) it runs (gcc/gcj 3.3.2) (!)

   - on OS X 10.3.1, it bus errors when running gc (boehm-gc) after cleaning
     up a thread, apparently (gcc/gcj 3.3.2).

   Running threads with gcj still seems a little too close to the edge
   apparently, at least on some platforms.

The second test: org.apache.lucene.demo.IndexFiles with java and gcj:

   - on linux (redhat 9) amd athlon xp 2400+ 2ghz 1gb:
     . running with jdk 1.4.2_02   : 21589 ms
     . compiled with gcj 3.3.2 -O2 : 18828 ms

   - on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
     . running with java 1.4.1_01-99 : 20379 ms
     . running with gcj 3.3.2 -O2    : 17842 ms
     . running clucene 0.8.9's demo  :  9930 ms

   So, on a more significant set of files, the performance difference between
   gcj lucene and java lucene is not so significant anymore. clucene is still
   way faster.

Why am I doing this ?

   I'm looking for a way to use lucene from python (not jython). The python
   port of lucene, lupy, is incomplete and 100% python, so is quite slooooow.
   The remaining contenders are hence lucene compiled with gcj as a native
   library and clucene, lucene's C++ port. Both can be easily wrapped by

   + clucene is faster
   - clucene, a C++ port of 1.2, is behind lucene and has its own set of bugs
   + gcj lucene makes lucene usable by non java processes such as python
   + gcj lucene is current with the latest lucene developments
   - gcj lucene is flaky when threading is used

Currently, I'm leaning towards using gcj lucene.


On Wed, 3 Dec 2003, Doug Cutting wrote:

> Andi Vajda wrote:
> > Yes, my sample was pitifully small last night. I intend to do another test
> > today with larger files. Using the JDK docs is great idea.
> I forgot to add: these are great benchmarks to have.  You're doing
> Lucene a service by providing them.  Thanks!
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message