lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Suggest with FST
Date Wed, 16 Nov 2011 18:32:22 GMT
I am currently working on a refactoring of FSTLookup so that either one or
both of your objectives will be possible.

I would still argue that storing exact scores does not make much sense
(think: if you collect query logs then you probably won't differentiate
between two suggestions that differ by two or three hits if their count is
in millions). The order of magnitude matters, not exact numbers.
Bucketing is not only a way to speed up collection (although it is a very
good way to speed it up!), it is also a way to abstract "classes" of
suggestions -- think of buckets as classes corresponding to "frequent",
"less frequent", "even less frequent", etc.

As for suggesting something else than the input suggestion this can be done
even now: when you're building FSTLookup, pass a string that is a
concatenation of what you expect as a prefix and a full completion, for
example:

bush|george bush
flower|plant

if you ask for suggestions for "geor" then the results will contain full
string, you only need to post-process.

The mechanism of using the automaton is identical, details change.

Dawid

On Wed, Nov 16, 2011 at 7:00 PM, Sudarshan Gaikaiwari <sudarshan@acm.org>wrote:

> Hi
>
> I am trying to implement an auto complete suggest system using FST.
> For my use case I cannot use FSTLookup for the following reasons.
>
> 1. I cannot construct the display string using the arc labels like
> FSTLookup as the display strings for autocompletion are different from the
> strings used as prefixes.
> 2. I am computing the scores for the suggestions by analyzing logs and do
> not want to put scores into a few buckets.
>
>
> Is there a way to get all the outputs from an FST for a particular prefix?
> I have been looking at the code for FST and FSTEnum but have not found a
> method that provides this functionality.
>
> Thanks
> Sudarshan
>
>
> --
> Sudarshan Gaikaiwari
> www.sudarshan.org
> sudarshan@acm.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message