lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Proposal for Lucene
Date Sat, 04 May 2002 02:48:16 GMT
Note that I will also be putting some web crawler code in the sandbox
soon.  The code is from Clemens, who posted a few messages recently.

Good, lets see some refactoring!

Otis


--- "Andrew C. Oliver" <acoliver@apache.org> wrote:
> Hi Manfred/Kelvin (whose name I saw on a lot of this),
> 
> I'm back on the on cycle and I was about to commit this stuff so we
> could start refactoring, I've got it building and all set up and
> ready. 
> But I wanted to make sure that you're still okay with it.  
> 
> Once I get it in lucene-sandbox we can start refactoring it and
> adding
> the new features.
> 
> Are we good to go?  Let me know and then we can watch the CVS commit
> messages fly into lucene-sandbox...
> 
> Thanks,
> 
> -Andy
> 
> On Fri, 2002-02-08 at 05:26, Manfred Schäfer wrote:
> > Hi,
> > 
> > i would suggest two sub-projects:
> > 
> > 1.Crawler - retrieving docs, wherever they are.....
> > 
> > 2. DocumentHandler extract Text, create apropriate fields etc..
> > 
> > The second is a layer on top of lucene. First is a autonomous
> package, wich
> > should be nicely integrated with lucene/Document-Handler, but
> should also be
> > usable for other projects.
> > 
> > I've included my code, to show you, what i've done. It isn't too
> useful yet,
> > because it is integrated in our product, but you can get the idea.
> Actually i've
> > written two things:
> > 
> > 1: A robot for crawling a remote server via http and writing all
> the data to
> > local filesystem, then importing it into our db and
> > (at the same time) replacing all links with internal links. So we
> could emulate
> > a web-Site from this crawled Data!
> > [com.synformation.script.utilities.importtool]
> > 
> > 2: (I've rewritten some of the code from 1 for that, so this is
> much cleaner) A
> > customer needs a tool for importing local mini-Websites on the
> file-system via
> > an applet, send it to the Web-Server and import it as described in
> point 1. I've
> > tried to write it in a way, that it could include the functionality
> of point 1
> > (retrieving vie http), but that is mostly untested.
> > [com.synformation.script.utilities.fileimport]
> > 
> > I don't say, that you(we) should use this. But i think it's time to
> come to a
> > more concrete plans. I'm interested to help on that for the
> crawler.
> > 
> > 
> > mfg,
> > 
> > manfred
> > 
> > 
> > 
> > 
> > ----
> > 
> 
> > --
> > To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> -- 
> http://www.superlinksoftware.com
> http://jakarta.apache.org/poi - port of Excel/Word/OLE 2 Compound
> Document 
>                             format to java
> http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
> 			- fix java generics!
> The avalanche has already started. It is too late for the pebbles to
> vote.
> -Ambassador Kosh
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message