lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Brown <...@us.ibm.com>
Subject Phrase search using quotes -- special Tokenizer
Date Fri, 01 Sep 2006 05:37:47 GMT

Hi,

After running some tests using the StandardAnalyzer, and getting 0 results
from the search, I believe I need a special Tokenizer/Analyzer.  Does
anybody have something that parses like the following:

- doesn't parse apart phrases (in quotes)
- doesn't parse/separate hyphentated or underscored words
other normal stuff like
- parses on whitespace
- removes periods in acronyms
- lowercases everything (even in quotes? -- maybe)

I basically have a set of terms, some of which are multi-worded phrases, but
none should ever be broken apart -- not when adding the documents, not when
querying the search results, etc.  I'm creating the field in the documents
as UN_TOKENIZED and using a StandardAnalyzer and basic Query object to get
the results.  Any suggestions and/or existing code that I could re-use to
fit this purpose?

Thanks.
-- 
View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6093138
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message