lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anjana Sarkar <anjana...@gmail.com>
Subject Re: Prefix Query for autocomplete - TooManyClauses
Date Fri, 13 Nov 2009 14:51:23 GMT
Hi Simon,

Thank you very much for your reply.

Maybe an example will help clarify my use case-

Say I have the following two indexed columns with this data

*data*            *boostfield*
african ant        10
alligator            50
anthem            20
antelope          30
another            5

And the query is "an*" and I am interested in top 3 results.

I would like "antelope", "anthem" and "african ant" to be returned  in that
order.

In this case ,  I am trying  to do something like this in lucene

select * from data where data like "an*" and boost >= 10
 I would like the boost field filtering to happen before looking for data
like "an*", so I am left with much fewer terms to iterate over.


--Anjana

On Fri, Nov 13, 2009 at 9:12 AM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> Anjana, maybe I don't understand you question correctly but what you
> want to do is a spell suggestion kind of thing on terms in the index,
> right? You try to use prefix query to display those terms as an
> auto-completion?!  So I assume that what you do is run a query and
> then get the possible terms from the stored values?!
>
> If I understand you correctly, wouldn't it be easier to just iterate
> the first n terms starting with your prefix? That should be quite fast
> and easy to implement if that would fit your requirements.
>
> simon
>
> On Fri, Nov 13, 2009 at 2:50 PM, Anjana Sarkar <anjanadeb@gmail.com>
> wrote:
> > We are using lucene for one our projects here and has been working very
> well
> > for last 2 years.
> > The new requirement is to use it for autocomplete. Here , queries like a*
> or
> > ab* pose a problem.
> > I have set BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ) to get
> around
> > the TooManyClausesException.
> > The issue now is performance is not acceptable. It takes about 3 secs for
> a*
> > query to return results.
> > I have 250,000 documents , each document is 5 - 15 words in the indexed
> > field and am using StandardAnalyzer. I have tried using a filter,
> > since in this case, I am only interested in documents with a boost higher
> > than a certain number. I had
> > the boost value as a separate lucene indexed field so I can filter on it.
> > I realized that the filtering is only applied after the boolean query is
> > prepared and scored, so there is no performance benefit with using that
> > approach.
> > I cannot use a ConstantScoreQuery as I need the top n matches for the
> query.
> > Any suggestions on how I can get around this issue will be highly
> > appreciated.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Anjana Sarkar
Address - 9 Sally Court, Bridgewater, NJ-08807
732-979-5219(mobile)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message