lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sami Dalouche <sko...@free.fr>
Subject Re: Lucene search optimization
Date Tue, 30 May 2006 19:16:47 GMT
Hi,

I didn't want to bother you with the exact details of my document, but
since you're asking.. :-)

So, I have the list of all world cities, and would like to let the users
search for their city, allowing them to do small mistakes.
Additionnally, since cities have sometimes different names, spellings,
etc (like a small city near mine which is called Le Perray en Yvelines,
sometimes spellt Le-Perray-en-Yvelines, Le Perray en Ynes, Le Perray,
etc). 

The way to limit the number of returned documents that I was thinking of
was to specify the country, which would then divide the search space,
but if you think of something better, I am open to any suggestion.

Soundex and metaphones are specific to languages, right ? Would it work
for cities ?

The cities are available as XML from http://www.sirika.com/data/xmlgz/

If you need more information, just ask.
Regards,
Sami Dalouche


Le mardi 30 mai 2006 à 11:22 -0400, Erik Hatcher a écrit :
> Sami,
> 
> You're on to the right approach seeking something other than  
> FuzzyQuery.  FuzzyQuery is rarely generally useful and there are  
> other ways to achieve the same sort of thing (soundex, metaphone) in  
> an efficient manner.
> 
> If you could share some details about these properties and how you  
> need to query them I'm sure the community could offer suggestions on  
> an efficient and clean implementation.  Without details, its not  
> possible to (easily) know how recommend a specific technique.
> 
> 	Erik
> 
> 
> On May 30, 2006, at 11:12 AM, Sami Dalouche wrote:
> 
> > Hi,
> >
> > I have 2 million documents, with a name property. (~15 to 20
> > characters).
> > Fuzzy searching against this property takes around 3 seconds, which is
> > way too much for what I plan to do, so I am considering the possible
> > optimizations. I can add a property to each of the documents, that  
> > could
> > partition the document space into 400 spaces. Each space would then be
> > limited to 5000 documents, which should be small enough to make the
> > fuzzy search faster.
> >
> > However, my question is : how do I take advantage of this additional
> > property ? Using a traditional RDBMS, I would add an index on the  
> > field,
> > but on Lucene, I'm not sure of how to proceed. Would filters be the  
> > way
> > to go ?
> > (http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ 
> > Filter.html)
> > Could a Caching Wrapperfilter help even more ?
> > (http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ 
> > CachingWrapperFilter.html)
> >
> > Additionnally, the additional property is an id, so can I store it  
> > as a
> > number so that it is faster (I guess) than string comparison ?
> >
> > Thanks a lot,
> > Sami Dalouche
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message