lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: lucene 1.3 RC3 compiled with gcj
Date Thu, 04 Dec 2003 11:10:10 GMT
Regarding this type of benchmark in general, I feel that even something
like api/java/util does not provide a big enough collection to let
various JVM tricks to kick in.  Why not use IndexFile for the whole api
directory, at least?
Also, the JVM can be tuned several different ways.  Setting min/max
heap size, picking different GC algorithms, etc. are some of them.  One
would have to experiment with different settings, really.


What I really wanted to ask in this message was about lupy.  You said
it's slow.  If possible, could you include lupy in your benchmark, too?

Thanks for this info!
Otis


--- Andi Vajda <andi@osafoundation.org> wrote:
> 
> I ran more significant benchmarks today but not without problems.
> 
> I tried to index the JDK api doc HTML files under api/java/util using
> IndexHTML. The problem with this test is that it uses threads which
> is
> risky with gcj. (and IndexHTML creates lots of threads instead of
> reusing
> them from a pre-allocated pool).
> 
> I then tried to index the JDK api doc HTML files under api/java/util
> using
> IndexFiles. The advantage is that IndexFiles is single threaded. I
> modified IndexFiles to only pickup .html files (a hack).
> 
> There are 250 HTML files under $JAVA_HOME/docs/api/java/util for
> about
> 6108kb of HTML text.
> 
> The first test: org.apache.lucene.demo.IndexHTML compiled with gcj
> -O2:
> 
>    - on win2k/cygwin (downloaded yesterday), the test crashes with
> 'too many
>      threads' and dumps core. (gcc/gcj 3.3.1).
> 
>    - on linux (redhat 9) it runs (gcc/gcj 3.3.2) (!)
> 
>    - on OS X 10.3.1, it bus errors when running gc (boehm-gc) after
> cleaning
>      up a thread, apparently (gcc/gcj 3.3.2).
> 
>    Running threads with gcj still seems a little too close to the
> edge
>    apparently, at least on some platforms.
> 
> The second test: org.apache.lucene.demo.IndexFiles with java and gcj:
> 
>    - on linux (redhat 9) amd athlon xp 2400+ 2ghz 1gb:
>      . running with jdk 1.4.2_02   : 21589 ms
>      . compiled with gcj 3.3.2 -O2 : 18828 ms
> 
>    - on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
>      . running with java 1.4.1_01-99 : 20379 ms
>      . running with gcj 3.3.2 -O2    : 17842 ms
>      . running clucene 0.8.9's demo  :  9930 ms
> 
>    So, on a more significant set of files, the performance difference
> between
>    gcj lucene and java lucene is not so significant anymore. clucene
> is still
>    way faster.
> 
> Why am I doing this ?
> 
>    I'm looking for a way to use lucene from python (not jython). The
> python
>    port of lucene, lupy, is incomplete and 100% python, so is quite
> slooooow.
>    The remaining contenders are hence lucene compiled with gcj as a
> native
>    library and clucene, lucene's C++ port. Both can be easily wrapped
> by
>    python.
> 
> pros/cons:
>    + clucene is faster
>    - clucene, a C++ port of 1.2, is behind lucene and has its own set
> of bugs
>    + gcj lucene makes lucene usable by non java processes such as
> python
>    + gcj lucene is current with the latest lucene developments
>    - gcj lucene is flaky when threading is used
> 
> Currently, I'm leaning towards using gcj lucene.
> 
> Andi..
> 
> 
> On Wed, 3 Dec 2003, Doug Cutting wrote:
> 
> > Andi Vajda wrote:
> > > Yes, my sample was pitifully small last night. I intend to do
> another test
> > > today with larger files. Using the JDK docs is great idea.
> >
> > I forgot to add: these are great benchmarks to have.  You're doing
> > Lucene a service by providing them.  Thanks!



__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message