lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: how to parse XHTML
Date Wed, 06 Mar 2002 23:42:11 GMT
Terry,

Check out the contribution sections of the lucene site. It has a few xml
document parsers.

--Peter

On 3/5/02 9:08 PM, "Otis Gospodnetic" <otis_gospodnetic@yahoo.com> wrote:

> Terry,
> 
> These are really not Lucene questions.  Lucene will let you index text,
> but you need to figure out how to parse your XHTML files.
> Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing
> XHTML, or perhaps Xerces from xml.apache.org can.
> 
> Otis
> 
> --- Terry McGregor <trmcgregor@hotmail.com> wrote:
>> 
>> Hi,
>> 
>> I'm new to Lucene, and I was wondering how I should parse XHTML
>> files. 
>> Should I name them with the .HTML file extention and use
>> org.apache.lucene.demo.IndexHTML or name them with the .XML file
>> extention 
>> and use an XML parser?
>> 
>> Also, I would like to keep my XHTML files with a .XHTML file
>> extention, if 
>> possible, but that's not so important.
>> 
>> Thanks,
>> Terry.
>> 
>> _________________________________________________________________
>> Join the world?s largest e-mail service with MSN Hotmail.
>> http://www.hotmail.com
>> 
>> 
>> --
>> To unsubscribe, e-mail:
>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
>> <mailto:lucene-user-help@jakarta.apache.org>
>> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Try FREE Yahoo! Mail - the world's greatest free email!
> http://mail.yahoo.com/
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message