lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Google finance-like suggestible search field
Date Thu, 15 Jan 2009 02:58:12 GMT
Sorry, hit the send too quickly. That last should read:

"much more suitable than forming a query".

Best
Erick

On Wed, Jan 14, 2009 at 9:57 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> First, it's a legitimate question whether matching on single-letter
> prefixes is useful for the user. If you're running into TooManyClauses,
> that means (if you haven't changed the defaults) that there are more
> than 1024 possibilities. Which is far too many for the user to scan
> through.
>
> You could look at the n-gram tokenizers (I confess I haven't used them
> so I'm not all *that* familiar with them). Or you could make a rule like
> "no autocomplete until the user types 3 characters" if that would work.
>
> Instead of forming a query, you might try using TermEnum, or
> WildCardTermEnum
> or even RegexTermEnum to quickly get the list of terms for your
> autocomplete. The
> nice part about this approach is that you could quit after a suitable
> number of
> terms were found rather than get them all. As I remember, WildCardTermEnum
> is
> faster than RegexTermEnum, but don't hold me to that. So I'd try
> WildCardTermEnum
> first, I think you'll find it much more suitable than forming
>
> Best
> Erick
>
>
> On Wed, Jan 14, 2009 at 9:38 PM, Hayes, Peter <Peter.Hayes@fmr.com> wrote:
>
>> Yes Jack that is what we found.
>>
>> One approach we kicked around is using a standard TermQuery but breaking
>> up each word into its prefixes.  For example, the word 'IBM' would be
>> added to a document broken into 'I', 'IB', 'IBM'.  The downsides seem to
>> be a lot of waste in the index.
>>
>> Any thoughts on that approach?
>>
>> -----Original Message-----
>> From: Jack Stahl [mailto:jack@yelp.com]
>> Sent: Wednesday, January 14, 2009 9:24 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Google finance-like suggestible search field
>>
>> Eric,
>>
>> I don't think that will work.  The PrefixQuery generates a giant
>> BooleanQuery that ORs one TermQuery for each matching term in the index
>> for
>> that prefix.  So the problem isn't the number of fields, but that
>> PrefixQueries dont scale to large indices.
>>
>> Jack
>>
>> On Wed, Jan 14, 2009 at 6:18 PM, Angel, Eric <eangel@business.com>
>> wrote:
>>
>> > Peter,
>> >
>> > Why don't you put all your "autocompletable" values into a single
>> > document field and just query a single field?  Google seems to only
>> use
>> > two fields for autocomplete - symbol and company name.
>> >
>> > Eric
>> >
>> > -----Original Message-----
>> > From: Hayes, Peter [mailto:Peter.Hayes@fmr.com]
>> > Sent: Wednesday, January 14, 2009 6:00 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Google finance-like suggestible search field
>> >
>> > Hi all,
>> >
>> > We are trying to implement a Google finance-like suggest as you type
>> > search field.  The index is quite large and comprised of multiple
>> fields
>> > to search across so our initial implementation was to use a
>> BooleanQuery
>> > with multiple PrefixQuery across each field.  We quickly ran into the
>> > TooManyClauses exception and are looking for alternatives.
>> >
>> > Is there an implementation pattern for this use case using lucene?
>> This
>> > seems like a common feature on various sites and I'm wondering if
>> lucene
>> > can be used to support this.
>> >
>> > Thanks in advance.
>> >
>> > Peter Hayes
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message