lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Fuzzy vs Prefix query Performance
Date Mon, 15 Jun 2009 15:24:10 GMT

FuzzyQuery performance is related to number of unique terms in the index not the number of
documents e.g. a single "telephone directory" document could contain millions of terms.
Each term considered is compared using an "edit distance" algo which is CPU intensive.

The FuzzyQuery prefix length setting dictates if the fuzzy edit distance comparisons are done
from A to Z (prefix length=0) or just those terms sharing the first n characters of the input
term. Obviously this can make a huge difference in number of terms compared (prefix length
of 1 would reduce search space to 1/26th of prefix length =0 assuming even distribution of
words in the alphabet).

Your prefix query does a simpler operation - the equivalent of String.startsWith(..) and will
typically operate on fewer terms.


----- Original Message ----
From: Erick Erickson <>
Sent: Monday, 15 June, 2009 15:34:18
Subject: Re: Fuzzy vs Prefix query Performance

Well, if you're seeing it, it's possible <G>....

But the first question is always "what were you measuring?" Be aware
that when you open a searcher, the first few queries can fill caches, etc
may take an anomalously long time, especially if you're sorting. So could
you give more details of your test setup?


On Mon, Jun 15, 2009 at 3:19 PM, Zsolt Koppany <>wrote:

> Hi,
> on 99470 documents (I mean Lucene documents) a FuzzyQuery needs approx 30
> seconds but PrefixQuery less than one.
> All Lucene files need 65MB together.
> I'm bit surprised of that. Is that possible?
> Zsolt
> Zsolt Koppany
> Phone: +49-711-67400-679
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message