lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shaposhnikov <...@tut.by>
Subject Performance of fuzzy query
Date Wed, 25 Feb 2004 16:20:14 GMT
Hi all.

I am using lucene-1.3-final and have performance problems with fuzzy 
queries.

If I understand right to perform fuzzy query lucene enumerate all terms 
in the index and construct BooleanQuery which consists of simple 
TermQueries.

The main problem is that this process is performing several times during 
search significantly decreasing performance.

Let me explain.

For example my search returns 1000 documents. In my application I need 
to get all this documents from index for later processing, but lucene 
rereads every 100 documents all terms in the index, because by default 
we get only 100 documents from index (see Hits class) and when I access 
101st document the search process is performed again and absolutely 
unnecessary operation of creating FilteredTermEnum is performed.

Unfortunately I can't say to Hits class to get all my 1000 document 
initially, because value 100 (actually 50 in the code) is hard coded. So 
I think that this value should be configurable in Searcher.

Actually I have performed some research and found that if I get all my 
1000 documents at the first stage the speed of my fuzzy query increased 
by 4 times!

Also please could you give me some advices on increasing performance of 
fuzzy search.

One approach that I already use in our application is custom fuzzy query 
that compares word prefix (f.e. the first 3 symbols) of terms and only 
if they equals than it tries to compare the rest of terms using the same 
algorithm that is used in FuzzyQuery.

Best regards,
Konstantin

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message