lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <a...@osafoundation.org>
Subject Re: lucene 1.3 RC3 compiled with gcj
Date Thu, 04 Dec 2003 21:30:59 GMT

All true. My interest in this is to find out if a gcj compiled lucene would
be usable, not so much running exact benchmarks.
But since you asked, here are more numbers I got running bigger datasets.
Beyond seeing the same performance trends comparing java and gcj lucene,
I'll leave further interpretation and appreciation to you.

  First test: IndexFiles of $JAVA_HOME/docs/api/java, about 2343 files,
              totalling 61588kb of HTML text.

    - on linux (redhat 9) amd athlon xp 2400+ 2ghz 1gb:
      . running with jdk 1.4.2_02   : 115423 ms
      . compiled with gcj 3.3.2 -O2 : 106350 ms

    - on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
      . running with java 1.4.1_01-99 : 168185 ms
      . running with gcj 3.3.2 -O2    : 148491 ms
      . running clucene 0.8.9's demo  :  82521 ms

  Second test: IndexFiles of $JAVA_HOME/docs/api, about 6572 files,
               totalling 156420k of HTML text.

    - on linux (redhat 9) amd athlon xp 2400+ 2ghz 1gb:
      . running with jdk 1.4.2_02   : 343222 ms
      . compiled with gcj 3.3.2 -O2 : 243918 ms

    - on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
      . running with java 1.4.1_01-99 : 400656 ms
      . running with gcj 3.3.2 -O2    : 403452 ms
      . running clucene 0.8.9's demo  : 169264 ms (maxed out at 5000 files)

Andi..

On Thu, 4 Dec 2003, Otis Gospodnetic wrote:

> Regarding this type of benchmark in general, I feel that even something
> like api/java/util does not provide a big enough collection to let
> various JVM tricks to kick in.  Why not use IndexFile for the whole api
> directory, at least?
> Also, the JVM can be tuned several different ways.  Setting min/max
> heap size, picking different GC algorithms, etc. are some of them.  One
> would have to experiment with different settings, really.
>
>
> What I really wanted to ask in this message was about lupy.  You said
> it's slow.  If possible, could you include lupy in your benchmark, too?
>
> Thanks for this info!
> Otis
>
>
> --- Andi Vajda <andi@osafoundation.org> wrote:
> >
> > I ran more significant benchmarks today but not without problems.
> >
> > I tried to index the JDK api doc HTML files under api/java/util using
> > IndexHTML. The problem with this test is that it uses threads which
> > is
> > risky with gcj. (and IndexHTML creates lots of threads instead of
> > reusing
> > them from a pre-allocated pool).
> >
> > I then tried to index the JDK api doc HTML files under api/java/util
> > using
> > IndexFiles. The advantage is that IndexFiles is single threaded. I
> > modified IndexFiles to only pickup .html files (a hack).
> >
> > There are 250 HTML files under $JAVA_HOME/docs/api/java/util for
> > about
> > 6108kb of HTML text.
> >
> > The first test: org.apache.lucene.demo.IndexHTML compiled with gcj
> > -O2:
> >
> >    - on win2k/cygwin (downloaded yesterday), the test crashes with
> > 'too many
> >      threads' and dumps core. (gcc/gcj 3.3.1).
> >
> >    - on linux (redhat 9) it runs (gcc/gcj 3.3.2) (!)
> >
> >    - on OS X 10.3.1, it bus errors when running gc (boehm-gc) after
> > cleaning
> >      up a thread, apparently (gcc/gcj 3.3.2).
> >
> >    Running threads with gcj still seems a little too close to the
> > edge
> >    apparently, at least on some platforms.
> >
> > The second test: org.apache.lucene.demo.IndexFiles with java and gcj:
> >
> >    - on linux (redhat 9) amd athlon xp 2400+ 2ghz 1gb:
> >      . running with jdk 1.4.2_02   : 21589 ms
> >      . compiled with gcj 3.3.2 -O2 : 18828 ms
> >
> >    - on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
> >      . running with java 1.4.1_01-99 : 20379 ms
> >      . running with gcj 3.3.2 -O2    : 17842 ms
> >      . running clucene 0.8.9's demo  :  9930 ms
> >
> >    So, on a more significant set of files, the performance difference
> > between
> >    gcj lucene and java lucene is not so significant anymore. clucene
> > is still
> >    way faster.
> >
> > Why am I doing this ?
> >
> >    I'm looking for a way to use lucene from python (not jython). The
> > python
> >    port of lucene, lupy, is incomplete and 100% python, so is quite
> > slooooow.
> >    The remaining contenders are hence lucene compiled with gcj as a
> > native
> >    library and clucene, lucene's C++ port. Both can be easily wrapped
> > by
> >    python.
> >
> > pros/cons:
> >    + clucene is faster
> >    - clucene, a C++ port of 1.2, is behind lucene and has its own set
> > of bugs
> >    + gcj lucene makes lucene usable by non java processes such as
> > python
> >    + gcj lucene is current with the latest lucene developments
> >    - gcj lucene is flaky when threading is used
> >
> > Currently, I'm leaning towards using gcj lucene.
> >
> > Andi..
> >
> >
> > On Wed, 3 Dec 2003, Doug Cutting wrote:
> >
> > > Andi Vajda wrote:
> > > > Yes, my sample was pitifully small last night. I intend to do
> > another test
> > > > today with larger files. Using the JDK docs is great idea.
> > >
> > > I forgot to add: these are great benchmarks to have.  You're doing
> > > Lucene a service by providing them.  Thanks!
>
>
>
> __________________________________
> Do you Yahoo!?
> Free Pop-Up Blocker - Get it now
> http://companion.yahoo.com/
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message