lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Kumar K <arunk...@gmail.com>
Subject Re: Wild Card Query Performance
Date Fri, 29 Mar 2013 10:13:27 GMT
Hi Uwe,

Thanks for the info.
You were mentioning about term dictionary and and other index components. I
didn't get this.
- What could be the other factors that improve the speed of such query ?
Can u explain or give some pointers to this ?
- Can we do something to improve speed for such queries ?

Also other observation is indexing time has increased by around 6% in 4.0.

Arun

On Fri, Mar 29, 2013 at 3:25 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> It depends on the type of wildcard query. If you only have a prefix (ab*),
> they rewrite to a simple PrefixQuery and this one is implemented exactly
> like in 3.x, so you only see the speed improvements of Lucene 4.0 in the
> term dictionary and and other index components, not related to the query
> itsself.
>
> If you have wildcards like ab?xy, then this query will be multiple times
> faster than in 3.x, because the "?" wildcard can only expand to a limited
> set of terms, while in Lucene 3.x, it still scans all terms with prefix
> "ab". The same applies to other wildcard constructs, if they limit more
> than just prefix.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Arun Kumar K [mailto:arunk786@gmail.com]
> > Sent: Friday, March 29, 2013 10:38 AM
> > To: java-user
> > Subject: Wild Card Query Performance
> >
> > Hi Guys,
> >
> > I have been testing the search time improvement in Lucene 4.0 from Lucene
> > 3.0.2 version for Wildcard Queries (with atleast say 2 chars Eg.ar*).
> >
> > For a 2GB size index with 4000000 docs, the following observations were
> > made:
> >
> > Around 3X improvement with and without STRING sort on a sortable field.
> >
> > I guess this improvement is because of the Automation Query by Robert
> > which is used in WildCard Queries.
> >
> > As per mike's blog, FuzzyQueries are 100X times faster in 4.0 but these
> > wildcard queries are not that faster comparatively.
> >
> > I have used default codecs and postings format.
> >
> > Did i miss something or is it the max improvement that we can expect
> > currently for WildCard Queries?
> >
> >
> > Arun
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message