lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: Need help on a Lucene problem
Date Fri, 02 Jan 2009 08:38:27 GMT
janis wrote:
> 
> Is there any way I can optimize this logic?or for that matter my whole
> approach/algorithm towards finding all jobs within 100 miles using Lucene?
>  
Hi.
I don't know how Lucene works per se.
But think about what you are really doing with your logic :
You are telling the search engine to

- look (in the whole database) for all items which have city = city-1, 
and keep a list of these item numbers
- look for all items which have city = city-2, and keep a list of these 
item numbers
...
- look for all items which have city = city-864, and keep a list of 
these item numbers

- now combine all the item numbers above, and return a list of the 
unique item numbers among them

- look for all the items that have state = state-1, and keep a list..
- look for all ... state-2, and keep a list...
...
- now combine all these items and return a list of the unique item 
numbers among them

- now combine the list from the cities, with the list from the states, 
and return a list of all unique item numbers among them

- look for all items which have skill = skill-1, and keep a list
...
... etc..

If your database contains 1,000,000 job items, no wonder it is taking 29 
seconds.

You would be much better off doing a first query, using first the 
criteria that are the most restrictive (aka will probably give the 
fewest hits), then applying another query to that result set and get 
another smaller set, then apply another query to that set to restrict it 
even further, etc..

Another aspect is that search engines like Lucene are the right tool to 
use when you are searching words which occur in a text, in relative 
position to eachother, and/or after stemming etc..
But they are not necessarily the best tool to use when you are looking 
for a strict (aka "stupid") string comparison, such as ' city == "New 
York" ', where the city name is in a field of its own and is in a fixed 
(predictable) form. (I mean that to search "New York" you can just 
compare the string "New York" and you do not have to do a query like 
"the word New next to the word York").
For example, since you already have your 864 city names in a table, in a 
known form, and since your items all have a field "city" in a known 
form, you could use Lucene to do the query excluding the city, get the 
list of results in an array, and then do a simple scan of your array in 
Java, keeping only the items that match one of your cities of choice 
(string comparison).  The same for the State.
With 10,000 results and 864 cities, using perl this would probably take 
less than a second. Your mileage with Java may vary.


Mime
View raw message