lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Quaroni <dquar...@OPENRATINGS.com>
Subject RE: Confusion over wildcard search logic
Date Tue, 23 Sep 2003 14:55:14 GMT
> Perhaps give the latest codebase a try too, just to see if any fixes 
> (particularly in that WildcardQuery.toString) are there.

It's our intention to put this into a production environment soon, so we
were waiting on 1.3 to go final before attempting to use it.

> i wouldn't worry about 
> memory too much until you've seen it to be a problem.  i think you'll 
> be fine (but don't currently have the understanding or data to back 
> that up).

The reason I split up the indexes by state was that I was running out of
memory (and searches were very slow) with the whole world of companies in
one index with all kinds of boolean joining.  With it split out, it seems to
do pretty well.

Well, after some extremely brief experimentation (Maybe I shoulda done it
before writing the email, huh?)  I discovered this:

**********************
This worked pretty well and got me some good results - the company that I
was looking for came back second (Which is pretty good given how general I
made the query)

Query> name:(amb proper*)
State> california
name:(amb proper*)
org.apache.lucene.search.BooleanQuery@a992f
amb proper*
31988 total matching documents

*************
This one matched a ton of documents, however the company I was looking for
came up first in the list, though with a pretty abysmal score of 0.23769014

Query> name:(amb prop*) and city:(south san fran*)
State> california
name:(amb prop*) and city:(south san fran*)
org.apache.lucene.search.BooleanQuery@34bb5
(amb prop*) (city:south city:san city:fran*)
721977 total matching documents

****************
The previous query took 1552 millis.  I was able to reduce that to 285
millis just by adding the +'s you suggested:

Query> name:(amb prop*) and city:(+south +san +fran*)
State> california
name:(amb prop*) and city:(+south +san +fran*)
org.apache.lucene.search.BooleanQuery@6d4c1
(amb prop*) (+city:south +city:san +city:fran*)
45011 total matching documents


Incidently, I say everything that I do with great awe at the power of Lucene
and respect for those who have made it possible.  Please don't take anything
I say as a gripe - I'm just learning how things work and that's a neccessary
step to take for any new software package of this type.  You just have to
learn the ins and outs and little quirks to be able to take full advantage
of it.  

Thanks!


Mime
View raw message