Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 85679 invoked from network); 6 Mar 2002 23:42:10 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 6 Mar 2002 23:42:10 -0000 Received: (qmail 25391 invoked by uid 97); 6 Mar 2002 23:42:14 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 25355 invoked by uid 97); 6 Mar 2002 23:42:13 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 25344 invoked from network); 6 Mar 2002 23:42:13 -0000 Errors-To: User-Agent: Microsoft-Entourage/10.0.0.1331 Date: Wed, 06 Mar 2002 15:42:11 -0800 Subject: Re: how to parse XHTML From: Peter Carlson To: Lucene Users List Message-ID: In-Reply-To: <20020306050848.42942.qmail@web12704.mail.yahoo.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Terry, Check out the contribution sections of the lucene site. It has a few xml document parsers. --Peter On 3/5/02 9:08 PM, "Otis Gospodnetic" wrote: > Terry, > > These are really not Lucene questions. Lucene will let you index text, > but you need to figure out how to parse your XHTML files. > Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing > XHTML, or perhaps Xerces from xml.apache.org can. > > Otis > > --- Terry McGregor wrote: >> >> Hi, >> >> I'm new to Lucene, and I was wondering how I should parse XHTML >> files. >> Should I name them with the .HTML file extention and use >> org.apache.lucene.demo.IndexHTML or name them with the .XML file >> extention >> and use an XML parser? >> >> Also, I would like to keep my XHTML files with a .XHTML file >> extention, if >> possible, but that's not so important. >> >> Thanks, >> Terry. >> >> _________________________________________________________________ >> Join the world?s largest e-mail service with MSN Hotmail. >> http://www.hotmail.com >> >> >> -- >> To unsubscribe, e-mail: >> >> For additional commands, e-mail: >> >> > > > __________________________________________________ > Do You Yahoo!? > Try FREE Yahoo! Mail - the world's greatest free email! > http://mail.yahoo.com/ > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > > -- To unsubscribe, e-mail: For additional commands, e-mail: