lucene-ruby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Nickel <...@oss-institute.org>
Subject rise
Date Thu, 07 Jul 2005 15:48:15 GMT
Hi all,
Miles Barr and Erik Hatcher posted on my webby and since i wanted to get
in contact with you sooner or later anyway, i'm doing it now :).
Originaly i wanted to wait until i have some more quality code, but
well...
(So if you read something like "it's working" or "i ported", don't take
this to literal please ;))

So let me introduce myself first, i'm Max Nickel and am working on a
project i called Rise, what tries to be a ruby implementation of Lucene.
I just read some of the recent posts on this mailing list, and it seems
that you are concentrating your efforts on getting it done with SWIG, so
i don't know if what i did will be of much use for you.
I took a different approach and first tried a pure ruby implementation.
This was rise-0.1.1 what you also can get on rise.rubyforge.org or on my
outdated Arch repo. At this stage everything was still very buggy and
nowhere what you can call working, but i had enough working to see that
pure ruby simply is unacceptable slow (i expected this to happen
anyway). 
So at this point i decided to port some of the more important parts in
terms of performance to C. I know that this might not be the best
approach when you care about portability or deployment, but i felt that
if you want to do something different then indexing your adressbook it
was necessary.
Right now i have ported following classes either complete or parts of it
as Mixins: 
FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer, Token,
QuickSort, HeapSort, TermInfosWriter#add and #write,
DocumentWriter#writePostings and #addPosition, and SegmentTermEnum +
some helper classes.

The C implementations doesn't use any different headers then ruby.h or
rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere
where ruby compiles, rise should compile also.
Also nearly all classes except the IO ones, aren't pure C, but make use
of ruby's C functions like rb_ivar_*, rb_funcall etc. 

As i wrote in an email to Miles Barr earlier, here are some very rough
indexing stats:
/usr/src/linux of a recent 2.6.12 kernel takes on my machine
with Lucene ~4 Minutes
with Rise in pure ruby > 60 Minutes
with my current Rise/C impl ~20 Minutes.

The current status is unfortunatly broken, since somewhere on my recent
changes i made some stupid mistake and keep getting "Docs out of
order"-Exceptions when merging segments. I havent had much time on my
hand lately to hunt this bug, but i hope it will be the last major one
before 0.1.2 release (except that the searching side is broken as it
isnt updated to the changes i made yet).

Since i was tired of GNU/Archs UI and switched to monotone you also cant
get my current sources. But when i managed to setup my local server i'll
let you know.

kind regards,
/max



Mime
View raw message