lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Speed of fuzzy searches
Date Fri, 03 Apr 2009 14:58:56 GMT
In a really weird "what is old, is new again" sort of thing, I am  
researching spellchecking, and came across: http://www.lucidimagination.com/search/document/cc46ac41bd4ee661/ngramspeller_contribution_re_combining_open_office_spellchecker_with_lucene#4f731c4209e3d7d0

  which suggests speeding up FuzzyQuery using JaroWinkler, but I don't  
think it was ever done.

Now, we have an implementation of JaroWinkler in the spell checker (in  
fact, we have pluggable distance measures there), perhaps it makes  
sense to think about how FuzzyQuery could leverage this pluggability?

Also, Matt, perhaps as an alternative to Fuzzy (which still can be  
slow even w/ your upgrade), you may look at doing spell checking  
instead.

-Grant

On Apr 3, 2009, at 11:26 AM, Matt Schraeder wrote:

> After doing some research I broke down and just updated my Zend
> Framework.  I just installed it not long ago so I didn't think much of
> it, but then I realized I'm running version 1.6.1 and that Zend is
> currently on 1.7.8.  Upon upgrading the complex fuzzy search that was
> taking 30 seconds now takes 0.067 seconds.  I have no idea what  
> changed
> in the past few months, and see no mention of performance issues on
> their issue tracker  but all seems well now.  Figured I'd post here to
> give others a heads up if they run into similar issues like I did.
>
>>>> MSchraeder@btsb.com 4/2/2009 3:32:01 PM >>>
>
>>>> erickerickson@gmail.com 4/2/2009 10:24:42 AM >>>
>> This seems really odd, especially with an index that size. The
>> first question is usually "Do you open an IndexReader for
>> each query?"
>
> I'm using the Zend_Search_Lucene implementation so I'm really not sure
> how it handles the IndexReader.  At the top of the script I open the
> index and do searches on it.  Unless Zend is doing something special
> in
> the background I'm assuming I'm using the IndexReader on a per-page
> basis.  I haven't been able to find any information on this yet, but
> from all the examples I've been reading none of them say to keep the
> index in a session to improve speed.  I'll have to get on the zend
> mailing lists to find out more about best practices.
>
>>>> markrmiller@gmail.com 4/2/2009 10:40:59 AM >>>
>> You might try setting a longer prefix. Fuzzy queries don't scale by
> the
>> way. By default they enumerate every unique term. How many unique
> terms
>> do you have in the index?
>
> I'll look at a longer prefix setting, as the reply above mentioned the
> improving search speed article.  Currently my index has 104076 terms
> in
> it.
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message