lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Brown <>
Subject Phrase search using quotes -- special Tokenizer
Date Fri, 01 Sep 2006 05:37:47 GMT


After running some tests using the StandardAnalyzer, and getting 0 results
from the search, I believe I need a special Tokenizer/Analyzer.  Does
anybody have something that parses like the following:

- doesn't parse apart phrases (in quotes)
- doesn't parse/separate hyphentated or underscored words
other normal stuff like
- parses on whitespace
- removes periods in acronyms
- lowercases everything (even in quotes? -- maybe)

I basically have a set of terms, some of which are multi-worded phrases, but
none should ever be broken apart -- not when adding the documents, not when
querying the search results, etc.  I'm creating the field in the documents
as UN_TOKENIZED and using a StandardAnalyzer and basic Query object to get
the results.  Any suggestions and/or existing code that I could re-use to
fit this purpose?

View this message in context:
Sent from the Lucene - Java Users forum at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message