lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vanlerberghe, Luc" <>
Subject RE: Passing XML objects to the analyzer ?
Date Wed, 20 Apr 2005 13:32:50 GMT
The problem with this approach is that the Analyser you will use for indexing will be *very*
different from the one used for searching.

The way I see it, the Document objects pqssed to Lucene should contain fields that are as
much text based as possible, comparable to what a user would type while searching.   It's
the task of the Analyzer then to break the text up in terms, remove capitals, etc, etc...
This should be kept as similar as possible for indexing and searching.

IMHO, only fields that are not Tokenized (like dates or keywords) or fields that are UnIndexed
should contain 'raw' data.


-----Original Message-----
From: Paul Libbrecht [] 
Sent: Tuesday, April 19, 2005 11:44 PM
Subject: Re: Passing XML objects to the analyzer ?

Le 19 avr. 05, à 22:50, Erik Hatcher a écrit :
> The only catch that I know if is that an Analyzer is invoked on a 
> per-field basis.  I can't tell exactly what you have in mind, but a 
> Lucene Analyzer cannot split data into separate fields itself - it has 
> to have been split prior.

That's an easy one... ok, yes, I was clearly aware of this.

> I'm indexing a lot of XML myself, with JDOM in the middle, and using 
> XPath to extract data per field before building the Document.

So wouldn't Field.Unstored(Object) actually make sense ?
That object, instead of being a reader, would be passed around till the analyzer call which
would then decide to accept, say, JDOM objects...


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message