lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Bresnik" <jbres...@auditintegrity.com>
Subject Re: JSP files
Date Thu, 03 Apr 2003 18:28:07 GMT

Additionally - you can use a crawler to crawl your site, then index the
resulting files. Lucene comes with a crawler called "LARM" but the current
make file doesnt build it properly. I ended using a different crawler called
Spinx :

http://www-2.cs.cmu.edu/~rcm/websphinx/





> Pinky,
>
> You don't want to index the jsp directly, as you would
> be missing the content inserted by the server when the
> pages are accessed. Typically indexing dynamic pages
> is problematic since the content will change
> freqently... That being said, the java.io library
> provides classes for retrieving the content of a URL
> as an input stream. You can write a class to traverse
> your site downloading the URLS and indexing them. It
> will be slower of course than reading HTML from disk
> files.
>
> -Tom
>
> --- Pinky Iyer <pinkyiyer@yahoo.com> wrote:
> >
> >  Hi all!
> >   Is there any seperate parser for jsp files. Any
> > other option other than modifying indexHTML.java
> > class is appreciated. I already tried modifying this
> > class, html parsing is fine, but jsp parsing yields
> > all the jsp tags as well in the summary...
> > Thanks!
> > Pinky
> >
> >
> >
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! Tax Center - File online, calculators, forms,
> > and more
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://tax.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message