lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Passing XML objects to the analyzer ?
Date Tue, 19 Apr 2005 20:50:34 GMT

On Apr 19, 2005, at 3:55 PM, Paul Libbrecht wrote:

> Hi,
> I am working on an index to search XML data in a fixed format that I 
> master well...
> The idea is that the XML content (which I have as JDOM object) 
> actually carries the semantic which would be best converted directly 
> into tokens by something like an analyzer. However, adding fields is 
> done not using the result of the analysis (or a stream thereof) but 
> using readers or strings.
> I have two choices and would like to know what's the best:
> - make the text passed to the analyzer a simple "instruction" which 
> will fetch the XML objects and do the analysis there
> - make a pre-analysis step which converts it into tokens of text which 
> then my analyzer catches again.
> I'd be more inclined for the first solution but I fear there's a catch.
> Is there one ?

The only catch that I know if is that an Analyzer is invoked on a 
per-field basis.  I can't tell exactly what you have in mind, but a 
Lucene Analyzer cannot split data into separate fields itself - it has 
to have been split prior.

I'm indexing a lot of XML myself, with JDOM in the middle, and using 
XPath to extract data per field before building the Document.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message