lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Kim" <>
Subject RE: Analysis
Date Tue, 01 Nov 2005 16:30:42 GMT
Ok... just got confused because you mentioned XML. Unless you're
actually indexing the raw XML in some of your fields, the fact that
you're indexing XML documents as your source content is irrelevant to
your choice of Analyzer.

Choice of indexer really depends on your specific project requirements
and what level of querying functionality your client needs. For example,
I started off using the StandardAnalyzer because it incorporates some
very useful and sophisticated functionality. But I found that it was
removing many stop words that my client requested the ability to query
with, so I will end up using my own custom analyzer class primarily
based on the StandardAnalyzer but modifying the stop word list.

> -----Original Message-----
> From: Malcolm [] 
> Sent: Tuesday, November 01, 2005 11:19 AM
> To:
> Subject: Re: Analysis
> Hi,
> I'm just asking for opinions on Analyzer's for the indexing. 
> For example Otis in his article uses the WhitespaceAnalyzer 
> and the Sandbox program uses the StandardAnalyzer.I am just 
> gauging opinions on the subject with regard to XML.
> I'm using a mix of the Sandbox XMLDocumentHandlerSAX and a 
> bit extra. I originally started using Digester but found that 
> I preferred the Sandbox implementation.
> Thanks,
> Malcolm Clark 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message