lucene-ruby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: rise
Date Sat, 09 Jul 2005 01:32:07 GMT
Max - Welcome!!!

I'm literally sitting on the edge of my seat anxious for a viable  
Ruby Lucene, so I applaud your efforts.

I'm very keen on the GCJ/SWIG approach so that the Ruby version can  
stay in sync with the Java version simply by running the build  
process, very much like PyLucene does.  A began a native Ruby port  
once upon a time myself (rucene and rubylucene at RubyForge, with  
very little code but some basic file I/O actually out there) and I  
dropped it once I saw PyLucene and how well it performed.

Hopefully you'd be interested in assisting with the nascent effort  
under way here, or if you come up with something on your own and  
would like to contribute it to Apache to live along side Java Lucene,  
we'd welcome it.


On Jul 7, 2005, at 11:48 AM, Max Nickel wrote:

> Hi all,
> Miles Barr and Erik Hatcher posted on my webby and since i wanted  
> to get
> in contact with you sooner or later anyway, i'm doing it now :).
> Originaly i wanted to wait until i have some more quality code, but
> well...
> (So if you read something like "it's working" or "i ported", don't  
> take
> this to literal please ;))
> So let me introduce myself first, i'm Max Nickel and am working on a
> project i called Rise, what tries to be a ruby implementation of  
> Lucene.
> I just read some of the recent posts on this mailing list, and it  
> seems
> that you are concentrating your efforts on getting it done with  
> SWIG, so
> i don't know if what i did will be of much use for you.
> I took a different approach and first tried a pure ruby  
> implementation.
> This was rise-0.1.1 what you also can get on or  
> on my
> outdated Arch repo. At this stage everything was still very buggy and
> nowhere what you can call working, but i had enough working to see  
> that
> pure ruby simply is unacceptable slow (i expected this to happen
> anyway).
> So at this point i decided to port some of the more important parts in
> terms of performance to C. I know that this might not be the best
> approach when you care about portability or deployment, but i felt  
> that
> if you want to do something different then indexing your adressbook it
> was necessary.
> Right now i have ported following classes either complete or parts  
> of it
> as Mixins:
> FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer,  
> Token,
> QuickSort, HeapSort, TermInfosWriter#add and #write,
> DocumentWriter#writePostings and #addPosition, and SegmentTermEnum +
> some helper classes.
> The C implementations doesn't use any different headers then ruby.h or
> rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere
> where ruby compiles, rise should compile also.
> Also nearly all classes except the IO ones, aren't pure C, but make  
> use
> of ruby's C functions like rb_ivar_*, rb_funcall etc.
> As i wrote in an email to Miles Barr earlier, here are some very rough
> indexing stats:
> /usr/src/linux of a recent 2.6.12 kernel takes on my machine
> with Lucene ~4 Minutes
> with Rise in pure ruby > 60 Minutes
> with my current Rise/C impl ~20 Minutes.
> The current status is unfortunatly broken, since somewhere on my  
> recent
> changes i made some stupid mistake and keep getting "Docs out of
> order"-Exceptions when merging segments. I havent had much time on my
> hand lately to hunt this bug, but i hope it will be the last major one
> before 0.1.2 release (except that the searching side is broken as it
> isnt updated to the changes i made yet).
> Since i was tired of GNU/Archs UI and switched to monotone you also  
> cant
> get my current sources. But when i managed to setup my local server  
> i'll
> let you know.
> kind regards,
> /max

View raw message