lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: Ignoring XML tags when Indexing
Date Fri, 25 Jul 2008 06:11:24 GMT
Kalani Ruwanpathirana wrote:
> Hi Marcelo,
> 
> Thanks for the reply. Yes I want to ignore all the tags and store the text
> in one field. Previously used tags are not known and seems the "XMLAnalyzer"
> is the
> solution. Anyway I think Lucene itself does not support a XMLAnalyzer. Do I
> have to do it manually?

What makes more sense (at least the way I see it) is to implement a 
Reader which returns the text you need from the XML.  This sort of thing 
is relatively simple to do with the newer StAX API.  You can have your 
reader return even small chunks of text, and it should perform okay as 
long as you have a BufferedReader wrapped around the entire thing.

Daniel

-- 
Daniel Noll

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message