lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Wildcard query with untokenized punctuation
Date Mon, 12 Mar 2007 20:52:57 GMT

: You're entirely correct about the analyzer (I'm using one that breaks on
: non-alphanumeric characters, so all punctuation is ignored).  To be
: honest, I hadn't thought about altering this, but I guess I could; just
: reticent that there might be unforeseen consequences.

this is where the PerFieldAnalyzerWraper and the KeywordAnalyzer come into
play ... any field you index as UN_TOKENIZED should be registered to use
the KeywordAnalyzer in the PerFieldAnalyzerWraper you pass to QueryParser,
so that filename:pagefile.sys and filename:pagetype.* both work.

(the first because analysis returns one token, the second becaues prefix
query doesn't do any analysis)

: But I'm still curious about the original solution.  Shouldn't it be
: possible to take "pagefile.*" and tokenize it (essentially throwing away
: the wildcard at the end)?

nope ... an Analyzer can be a one way function (consider stemming an
lowercasing) so there's no way to every analyze "foo" in a query for
"foo*" and assume that the Analyzer will produce the correct term to build
a prefix query out of that will match "food", "fools", etc...

the only safe thing to do is skip Analysis alltogether, and have options
for some of the really common things people want (ie:
QueryParser.setLowercaseExpandedTerms)

if you think PrefixQuerie ar bad, try wrapping your head arroudn the
issues of Analyzing wildcard queries.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message