lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <ka...@snigel.dnsalias.net>
Subject Re: lucene webcrawler/dbms indexing framework
Date Fri, 02 Apr 2004 12:29:27 GMT
On Thu, 1 Apr 2004 14:59:33 -0800 (PST)
Woolly Mammoth <dewoollyone@yahoo.com> wrote:

> Hi All,
> 	I have seen some discussion in the past around LARM & other web
> crawler indexing code, but not much output. I have started a project on
> SF http://sourceforge.net/projects/knine, and have commited some
> initial framework code to CVS (despite the front page saying there are
> not commits...), I haven't done a release yet, mainly because I need to
> check licencing & am also having some trouble getting PDFBox to get all
> fields in docs. If anyone has time to help/review would be great. I
> wanted to try & licence as Apache style for contributers & gpl for
> others, anyone know about this ?
> 
> The real goal of this is an easy to deploy lucene implementation, but
> also scalable & flexible for customisation.
> I will be putting all the currently hardcoded indexing rules into
> config files asap.. - then hopefully getting a mgmt interface over the
> files & indexing process

I'm also working on such a project. It works quite nice, but I have yet
not released any code. There is some information and an UML class diagram
describing the core at <http://snigel.dnsalias.net/snigelwiki/Egdelon>.

If you are interested in taking a closer look, let me know.



-- 

karl

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message