lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Isakson" <Eric.Isak...@sas.com>
Subject RE: HTMLDocument with META tags?
Date Mon, 09 Dec 2002 16:49:53 GMT
Have a look at the source for org.apache.lucene.demo.html.HTMLParser.jj

It stores the META tags in a Properties object that you can access via the getMetaTags() method.

The Document(File f) method of org.apache.lucene.demo.HTMLDocument is the one making the Document
objects to store in the index. It does not add the meta tags to the index. You will either
need to modify that or create your own document objects and index using the HTMLParser class
or some other tool that parses your HTML files for you.

Eric

-----Original Message-----
From: mchaput [mailto:mchaput@aw.sgi.com]
Sent: Monday, December 09, 2002 11:40 AM
To: Lucene Developers List
Subject: Re: HTMLDocument with META tags?


Otis Gospodnetic wrote:
> The HTMLParser.jj should already do that.
> 
> Otis

The version I have doesn't seem to (no "meta" in the source code at 
all), or is there a trick to getting them out? Or is it in a newer 
version of Lucene than I have?

Sorry to bother, but it would solve a lot of problems for me if it 
really is in there.

Cheers,

Matt



-- 
                       |
Matt Chaput           |   A l i a s | W a v e f r o n t
Information Designer  |   210 King St. E. Toronto, ON, Canada M5A 1J7
mchaput@aw.sgi.com    |   (416) 874-8268
                       |
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message