lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Disabling modifiers?
Date Tue, 16 Dec 2003 12:28:26 GMT
On Tuesday, December 16, 2003, at 05:46  AM, Iain Young wrote:
>> Treating them as two separate words when quoted is indicative of your
>> analyzer not being sufficient for your domain.  What Analyzer are you
>> using?  Do you have knowledge of what it is tokenizing text into?
> I have created a custom analyzer (CobolAnalyzer) which contains some 
> custom
> stop words for the language, but it's using the StandardTokenizer and
> StandardFilters. I'll have a look and see if I can see what it's 
> actually
> tokenizing the text into...

Look at my article at and try out the AnalyzerDemo code using 
some sample text and your custom analyzer:

One of the things I plan to do with an enhanced Lucene demo to ship 
with Lucene's binary distributions is integrate in this type of 
"analyzing the analyzer" feature.  It is the root of a lot of questions 
about Lucene.  You can really only search for what you index, and you 
only index what the Analyzer creates, so understanding it is key to a 

And yes, if you are using StandardTokenizer, you are probably not 
tokenizing COBOL quite like you expect.  Is there a COBOL parser you 
could tap into that could give you the tokens you want?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message