lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Search speed
Date Tue, 02 Nov 2004 18:30:58 GMT
Jeff Munson wrote:
> Single word searches return pretty fast, but when I try phrases,
> searching seems to slow considerably. [ ... ]
> 
> However, if I use this query, contents:"all parts including picture tube
> guaranteed", it returns hits in 2890 millseconds.  Other phrases take
> longer as well.  

You could use an analyzer that inserts bigrams for common terms.  Nutch 
does this.  So, if you declare that "all" and "including" are common 
terms, then this could be tokenized as the following tokens:

0 - all all.parts
1 - parts parts.including
2 - including including.picture
3 - picture
4 - tube
5 - guaranteed

Two tokens at a position indicate where the second has position 
increment of zero.

Then your phrase search could be converted to:

   "all.parts parts.including including.picture picture tube guaranteed"

which should be much faster, since it has replaced common terms with 
rare terms.

This approach does make the index larger, and hence makes indexing 
somewhat slower.  So you don't want to declare too many words as common, 
but a handful can make a big difference if they're used frequently in 
queries.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message