lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Google finance-like suggestible search field
Date Thu, 15 Jan 2009 02:57:38 GMT
First, it's a legitimate question whether matching on single-letter
prefixes is useful for the user. If you're running into TooManyClauses,
that means (if you haven't changed the defaults) that there are more
than 1024 possibilities. Which is far too many for the user to scan through.

You could look at the n-gram tokenizers (I confess I haven't used them
so I'm not all *that* familiar with them). Or you could make a rule like
"no autocomplete until the user types 3 characters" if that would work.

Instead of forming a query, you might try using TermEnum, or
WildCardTermEnum
or even RegexTermEnum to quickly get the list of terms for your
autocomplete. The
nice part about this approach is that you could quit after a suitable number
of
terms were found rather than get them all. As I remember, WildCardTermEnum
is
faster than RegexTermEnum, but don't hold me to that. So I'd try
WildCardTermEnum
first, I think you'll find it much more suitable than forming

Best
Erick

On Wed, Jan 14, 2009 at 9:38 PM, Hayes, Peter <Peter.Hayes@fmr.com> wrote:

> Yes Jack that is what we found.
>
> One approach we kicked around is using a standard TermQuery but breaking
> up each word into its prefixes.  For example, the word 'IBM' would be
> added to a document broken into 'I', 'IB', 'IBM'.  The downsides seem to
> be a lot of waste in the index.
>
> Any thoughts on that approach?
>
> -----Original Message-----
> From: Jack Stahl [mailto:jack@yelp.com]
> Sent: Wednesday, January 14, 2009 9:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: Google finance-like suggestible search field
>
> Eric,
>
> I don't think that will work.  The PrefixQuery generates a giant
> BooleanQuery that ORs one TermQuery for each matching term in the index
> for
> that prefix.  So the problem isn't the number of fields, but that
> PrefixQueries dont scale to large indices.
>
> Jack
>
> On Wed, Jan 14, 2009 at 6:18 PM, Angel, Eric <eangel@business.com>
> wrote:
>
> > Peter,
> >
> > Why don't you put all your "autocompletable" values into a single
> > document field and just query a single field?  Google seems to only
> use
> > two fields for autocomplete - symbol and company name.
> >
> > Eric
> >
> > -----Original Message-----
> > From: Hayes, Peter [mailto:Peter.Hayes@fmr.com]
> > Sent: Wednesday, January 14, 2009 6:00 PM
> > To: java-user@lucene.apache.org
> > Subject: Google finance-like suggestible search field
> >
> > Hi all,
> >
> > We are trying to implement a Google finance-like suggest as you type
> > search field.  The index is quite large and comprised of multiple
> fields
> > to search across so our initial implementation was to use a
> BooleanQuery
> > with multiple PrefixQuery across each field.  We quickly ran into the
> > TooManyClauses exception and are looking for alternatives.
> >
> > Is there an implementation pattern for this use case using lucene?
> This
> > seems like a common feature on various sites and I'm wondering if
> lucene
> > can be used to support this.
> >
> > Thanks in advance.
> >
> > Peter Hayes
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message