lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Opler <chrisop...@free.fr>
Subject Re: JSP Parser class wanted
Date Sun, 24 Feb 2002 12:24:10 GMT
Hi,

this is a great tool to retrieve and scrape html pages (rendered or not)...

http://www.research.compaq.com/SRC/WebL/

:-)

Chris Opler

w i l l i a m__b o y d wrote:

> >      If they're mostly static, why not just code a little crawler to
> > request the pages via the web-server and parse the rendered HTML?
> >
>
> right then. i've added that onto my list of things to do. immediately after
> "meet project deadline" and "...learning javacc and lucene inside and
> out..." ;¬) if anyone has such code they're willing to contribute i would
> put it to good use.
>
> ----- Original Message -----
> From: Steven J. Owens <puffmail@darksleep.com>
> To: Lucene Users List <lucene-user@jakarta.apache.org>; w i l l i a m__b o y
> d <will@javafreelancer.com>
> Sent: Sunday, February 24, 2002 1:25 AM
> Subject: Re: JSP Parser class wanted
>
> > w i l l i a m__b o y d <will@javafreelancer.com> writes:
> >
> > > i have had some success in solving my problem. mind you, it is a
> > > hack; a quick fix. it may or may not work for everyone. also the jsp
> > > pages i am indexing/searching have very little dynamically generated
> > > content. they are mostly static.
> >
> >      If they're mostly static, why not just code a little crawler to
> > request the pages via the web-server and parse the rendered HTML?
> >
> > Steven J. Owens
> > puff@darksleep.com
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
=======================
http://www.openwine.org



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message