lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Quaroni <dquar...@OPENRATINGS.com>
Subject Confusion over wildcard search logic
Date Tue, 23 Sep 2003 00:26:40 GMT
Hi there.  I've got an index of company names, and it's split up into
separate indexes by state.

I have a simple command line interface for testing.  I'm getting some odd
results, though, with certain logic of wildcard searches.  It seems like
depending on what order I put the fields of the query in alters the results
drastically when I AND them together.

Here are some examples:

***************************
This one makes sense

Query> name:amb*
State> california
name:amb*
org.apache.lucene.search.PrefixQuery@2d086a
amb*
2819 total matching documents 

***************************
This is the REALLY confusing one.  We know there's a company named AMB
Property Corporation.  Why do I get NO hits?

Query> name:"amb prop*"
State> california
name:"amb prop*"
org.apache.lucene.search.PhraseQuery@20dda
"amb prop"
0 total matching documents

***************************
Ok, so I get some results with this (I know the * isn't neccessary at the
end of property, but bear with me for the next example where it goes all
screwy)

Query> name:amb property*
State> california
name:amb property*
org.apache.lucene.search.BooleanQuery@10b053
amb name:amb property*:property*
56 total matching documents

***************************
south san francisco is an exact match to the city.  Why does this find 0
results??!

Query> name:amb property* AND city:south san francisco
State> california
name:amb property* AND city:south san francisco
org.apache.lucene.search.BooleanQuery@283b8a
amb +name:amb property* AND city:south san francisco:property* +city:south
name:
amb property* AND city:south san francisco:san name:amb property* AND
city:south
 san francisco:francisco
0 total matching documents

****************************
Do this and suddenly I get matches

Query> name:amb propert* and city:"south san fran*"
State> california
name:amb propert* and city:"south san fran*"
org.apache.lucene.search.BooleanQuery@3ee284
amb name:amb propert* and city:"south san fran*":propert* city:"south san
fran"56 total matching documents 

*****************************
And look, this gets matches too:

Query> name:"amb propert*" and city:"south san*"
State> california
name:"amb propert*" and city:"south san*"
org.apache.lucene.search.BooleanQuery@a32b
"amb propert" city:"south san"
10732 total matching documents 

*****************************
Yet do this and we're back to 0 results:

Query> name:"amb propert*" and city:"south san fran*"
State> california
name:"amb propert*" and city:"south san fran*"
org.apache.lucene.search.BooleanQuery@58957f
"amb propert" city:"south san fran"
0 total matching documents

******************************
Now flip the query around and it works:

Query> city:"south san fran*" and name:amb propert*
State> california
city:"south san fran*" and name:amb propert*
org.apache.lucene.search.BooleanQuery@965fb
city:"south san fran" amb city:"south san fran*" and name:amb
propert*:propert*
56 total matching documents

*******************************
Finally, using the prefix of the metaphone name with quotes around it
produces no results:

Query> metaph_name:"ambprp*"
State> california
metaph_name:"ambprp*"
org.apache.lucene.search.TermQuery@67b241
metaph_name:ambprp
0 total matching documents 

*******************************
But take away the quotes and it works:

Query> metaph_name:ambprp*
State> california
metaph_name:ambprp*
org.apache.lucene.search.PrefixQuery@21c887
metaph_name:ambprp*
6 total matching documents 


********************************
But quotes don't seem to matter in this complex wildcard:

Query> metaph_name:ambprp* and city:"sou* or san or fra*"
State> california
metaph_name:ambprp* and city:"sou* or san or fra*"
org.apache.lucene.search.BooleanQuery@7ffe01
metaph_name:ambprp* city:"sou san fra"
6 total matching documents 


So...  Can someone help me nail down the logic for these things so we can
construct some good queries?

Thanks!

Mime
View raw message