lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re:_HTML_parser
Date Sat, 20 Apr 2002 14:10:04 GMT
Laura,

Search the lucene-user and lucene-dev archives for things like:
crawler
spider
spindle
lucene sandbox

Spindle is something you may want to look at, as is MoJo (not mentioned
on lucene lists, use Google).

Otis

> Did someone solve the problem to spider recursively a web pages?

> > >While trying to research the same thing, I found the
> following...here
> 's a 
> > >good example of link extraction.....
> > 
> > Try http://www.quiotix.com/opensource/html-parser
> > 
> > Its easy to write a Visitor which extracts the links; should take
> abou
> t ten 
> > lines of code.


__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message