lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Woolly Mammoth <dewoolly...@yahoo.com>
Subject lucene webcrawler/dbms indexing framework
Date Thu, 01 Apr 2004 22:59:33 GMT
Hi All,
	I have seen some discussion in the past around LARM & other web
crawler indexing code, but not much output. I have started a project on
SF http://sourceforge.net/projects/knine, and have commited some
initial framework code to CVS (despite the front page saying there are
not commits...), I haven't done a release yet, mainly because I need to
check licencing & am also having some trouble getting PDFBox to get all
fields in docs. If anyone has time to help/review would be great. I
wanted to try & licence as Apache style for contributers & gpl for
others, anyone know about this ?

The real goal of this is an easy to deploy lucene implementation, but
also scalable & flexible for customisation.
I will be putting all the currently hardcoded indexing rules into
config files asap.. - then hopefully getting a mgmt interface over the
files & indexing process

thanks
Dave


__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway 
http://promotions.yahoo.com/design_giveaway/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message