lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Wasson <jonrwas...@yahoo.com>
Subject Re: summary text for indexed jsp files -- modify the HTMLParser.jj
Date Wed, 14 Aug 2002 14:19:51 GMT
My bad... you would have to redo this line in the file
indexHTML.java...

else if (file.getPath().endsWith(".html") || // index
.html files
	       file.getPath().endsWith(".htm") || // index
.htm files


--- Jon Wasson <jonrwasson@yahoo.com> wrote:
> Karen - 
> There is probably a simpler solution... why parse
> the
> local .jsp file at all?  Grab it with a webcrawler
> and
> write it locally to your disk in a temp directory. 
> Then parse the file with with its content as though
> it
> were a complete HTML file.  This way instead of
> dealing with files by extension
> (.jsp,.html,.asp,.etc), you can deal with them by
> mime
> type, thus lumping them all together.  This can take
> care of messy issues like how to get at Lotus Notes
> documents (without going through the client) or how
> to
> parse an .asp file vs a .jsp file.  And you can use
> the standard indexHTML file as it currently is. Just
> a
> suggestion.
> 
> Jon
> 
> 
> 
> 
> 
> 	karen bran <kgblucene@yahoo.com>
> 	08/12/2002 04:28 PM
> 	Please respond to "Lucene Users List"
> 		 
> 		 To: lucene-user@jakarta.apache.org
> 		 cc: 
> 		 Subject: summary text for indexed jsp files --
> modify the HTMLParser.jj
> 
> 
> 
> Hello,
> 
> I modified the IndexHTML.java and let the jsp files
> be
> indexed, but the source code of the jsp tags such as
> <%@page import....... shows up in the result
> summary.
> 
> I checked this mailing list messages, someone
> suggested to modify the HTMLParser.jj file to make
> the
> jsp tag text as the 3rd comment. Since I am not
> familiar with the Javacc grammar, I don't know how
> to
> hack the HTMLParser.jj and insert in the 3rd comment
> tag for the jsp tag.
> 
> here is the 2 existing comment tags in the
> HTMLParser.jj,  can someone help me to figure out
> how
> to add the 3rd one ??? 
> 
> Thanks a lot.
> 
>  
> 
> <WithinComment1> TOKEN :
> {
>   < CommentText1:  (~["-"])+ | "-" >
> | < CommentEnd1:   "-->" > : DEFAULT
> }
> 
> <WithinComment2> TOKEN :
> {
>   < CommentText2:  (~[">"])+ >
> | < CommentEnd2:   ">" > : DEFAULT
> }
> 
>  
> 
> WithinComment3> TOKEN :
> {
>   < CommentText3:  ?????? >
> | < CommentEnd3:   ??????> : DEFAULT
> }
> 
> 
> 
> ---------------------------------
> Do You Yahoo!?
> HotJobs, a Yahoo! service - Search Thousands of New
> Jobs
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> HotJobs - Search Thousands of New Jobs
> http://www.hotjobs.com
> 


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message