Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 4368 invoked from network); 24 Feb 2002 16:44:38 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 24 Feb 2002 16:44:38 -0000 Received: (qmail 1774 invoked by uid 97); 24 Feb 2002 16:44:40 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 1741 invoked by uid 97); 24 Feb 2002 16:44:39 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 1726 invoked from network); 24 Feb 2002 16:44:38 -0000 Subject: Re: Proposal for Lucene From: "Andrew C. Oliver" To: Lucene Developers List In-Reply-To: <3C63A7E8.255ACACA@bouncy.com> References: <008801c1b03f$d7a00020$0b01a8c0@168.1.8.Domainrelevanz> <00c401c1b04c$c7d6a2e0$6401a8c0@darden.virginia.edu> <3C63A7E8.255ACACA@bouncy.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution/1.0.2 Date: 24 Feb 2002 11:42:08 -0500 Message-Id: <1014568928.2538.838.camel@linux2.superlinksoftware.com> Mime-Version: 1.0 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Fri, 2002-02-08 at 05:26, Manfred Sch=E4fer wrote: > Hi, >=20 > i would suggest two sub-projects: >=20 I think "packages" would be more appropriate of a description, I wouldn't call them "subprojects" so to speak. > 1.Crawler - retrieving docs, wherever they are..... >=20 > 2. DocumentHandler extract Text, create apropriate fields etc.. >=20 +1 thats what I was getting at in the proposal about DocumentFactory etc. > The second is a layer on top of lucene. First is a autonomous package, wi= ch > should be nicely integrated with lucene/Document-Handler, but should also= be > usable for other projects. >=20 hummm...I'm not entirely sure I'd go that far. Well encapsulated for sure but How usable by other projects is up to them not us... > I've included my code, to show you, what i've done. It isn't too useful y= et, > because it is integrated in our product, but you can get the idea. Actual= ly i've > written two things: >=20 > 1: A robot for crawling a remote server via http and writing all the data= to > local filesystem, then importing it into our db and > (at the same time) replacing all links with internal links. So we could e= mulate > a web-Site from this crawled Data! > [com.synformation.script.utilities.importtool] >=20 I looked through this! Great stuff! Do you own this code? Are you able to donate it to Lucene (APL and all)? It looks like a great starting point. We'd have to do some refactoring but it looks pretty dern good to me. I haven't tried running it, just skimmed through. > 2: (I've rewritten some of the code from 1 for that, so this is much clea= ner) A > customer needs a tool for importing local mini-Websites on the file-syste= m via > an applet, send it to the Web-Server and import it as described in point = 1. I've > tried to write it in a way, that it could include the functionality of po= int 1 > (retrieving vie http), but that is mostly untested. > [com.synformation.script.utilities.fileimport] >=20 My brain didn't parse that.. > I don't say, that you(we) should use this. But i think it's time to come = to a > more concrete plans. I'm interested to help on that for the crawler. >=20 If you're able to donate it (legally) I kinda think there is a lot here. It of course needs to be refactored to meet some of the objectives we've outlined, but a darn good starting point IMHO! >=20 > mfg, >=20 > manfred >=20 >=20 >=20 >=20 > ---- >=20 > -- > To unsubscribe, e-mail: > For additional commands, e-mail: --=20 http://www.superlinksoftware.com http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document=20 format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html=20 - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: For additional commands, e-mail: