lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: how to parse XHTML
Date Wed, 06 Mar 2002 05:08:48 GMT
Terry,

These are really not Lucene questions.  Lucene will let you index text,
but you need to figure out how to parse your XHTML files.
Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing
XHTML, or perhaps Xerces from xml.apache.org can.

Otis

--- Terry McGregor <trmcgregor@hotmail.com> wrote:
> 
> Hi,
> 
> I'm new to Lucene, and I was wondering how I should parse XHTML
> files. 
> Should I name them with the .HTML file extention and use 
> org.apache.lucene.demo.IndexHTML or name them with the .XML file
> extention 
> and use an XML parser?
> 
> Also, I would like to keep my XHTML files with a .XHTML file
> extention, if 
> possible, but that's not so important.
> 
> Thanks,
> Terry.
> 
> _________________________________________________________________
> Join the world’s largest e-mail service with MSN Hotmail. 
> http://www.hotmail.com
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message