lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <>
Subject Re: lucene webcrawler/dbms indexing framework
Date Fri, 02 Apr 2004 12:29:27 GMT
On Thu, 1 Apr 2004 14:59:33 -0800 (PST)
Woolly Mammoth <> wrote:

> Hi All,
> 	I have seen some discussion in the past around LARM & other web
> crawler indexing code, but not much output. I have started a project on
> SF, and have commited some
> initial framework code to CVS (despite the front page saying there are
> not commits...), I haven't done a release yet, mainly because I need to
> check licencing & am also having some trouble getting PDFBox to get all
> fields in docs. If anyone has time to help/review would be great. I
> wanted to try & licence as Apache style for contributers & gpl for
> others, anyone know about this ?
> The real goal of this is an easy to deploy lucene implementation, but
> also scalable & flexible for customisation.
> I will be putting all the currently hardcoded indexing rules into
> config files asap.. - then hopefully getting a mgmt interface over the
> files & indexing process

I'm also working on such a project. It works quite nice, but I have yet
not released any code. There is some information and an UML class diagram
describing the core at <>.

If you are interested in taking a closer look, let me know.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message