lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Problem using wildcardsearch in phrase search
Date Sun, 13 May 2007 11:36:03 GMT

> I think the KeywordAnlyser bit is maybe a red herring, the problem 
> seems to be that you cant use * within double quotes, I made some 
> changes to my data and index to remove the space character
>
You can't use a wildcard within double quotes. The Lucene syntax grammar 
does not look for such things. The KeywordAnalyzer is not a red herring 
though...it will not work as expected. Like I said, the QueryParser 
breaks up a query where it sees spaces BEFORE sending to an analyzer. 
This is why Erik thinks that KeywordAnazlyer is the same as 
WhiteSpaceAnalyzer...it is not...KeywordAnazlyer does not break up words 
on spaces...it just looks like it when you use QueryParser because 
QueryParser breaks up words by spaces before even using an Analyzer. 
When you use KeywordAnalyzer for indexing it will not break on 
spaces...when you use it with the QueryParser it WILL break on spaces.
> If I fed 54:puid* to my code it generates a Prefix Query and works as 
> required
> Search Query Is54:puid*
> Parsed Search Query Is54:puid*of type:class 
> org.apache.lucene.search.PrefixQuery
>
> but with the quotes (which I would need if my value contained spaces)  
> I only get a Term Query (which doesnt handle wildcards)
> Search Query Is54:"puid*"
> Parsed Search Query Is54:puid*of type:class 
> org.apache.lucene.search.TermQuery
I explained this, though apparently not well. QueryParser feeds chunks 
of text to the analyzer. If there are spaces, each term is fed to the 
analyzer one at a time and a TermQuery is generated for each chunk. If 
you have something in quotes, the whole piece is fed to the 
analyzer...if the analyzer generates more than one token, a phrasequery 
is generated...if only ONE token is generated (as is the case when you 
feed anything within quotes to a KeywordAnalyzer), then a TermQuery is 
generated. The code does not look for any wildcards. Look at the 
QueryParser code, getField in particular. One option you have is to 
override getField, and before generating a TermQuery, scan for wildcards 
and if you find one, return a wildcardquery instead of a term query.

I realize I have a weak grasp of my own language and can be a lot less 
than clear...but the information you need has certainly been given in my 
emails. Perhaps someone else could be a little more clearer than I am able.

- Mark
>
> so why is this ?
>
> thanks Paul
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message