lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-user] C library:Suggester
Date Wed, 03 May 2017 21:30:07 GMT
You might find this Perl implementation a helpful reference.

https://metacpan.org/pod/LucyX::Suggester

On Wed, May 3, 2017 at 3:06 PM, Serkan Mulayim <serkanmulayim@gmail.com>
wrote:

> Thank you very much Marvin,
>
> When I type hell, I would like to get tokens starting with hell, e.g.
> {"hell","hello","helix"}. I do not want to get documents which contain hell
> token in the title. So it seems like it should be working on the tokens.
>
> What I need is basically to be able to iterate over all tokens which are
> lexicographically ordered. Also I would need to sort them based on their
> frequencies when returning the results. I guess Lexicon class,
> https://lucy.apache.org/docs/c/Lucy/Index/Lexicon.html,  is designed for
> this. Can you please confirm? I hope the returned results in the
> lucy_Lex_seek contains the frequency of the terms as well.
>
> Thanks again,
> Serkan
>
>
>
>
>
> On Tue, May 2, 2017 at 4:22 PM, Marvin Humphrey <marvin@rectangular.com>
> wrote:
>
> > On Mon, May 1, 2017 at 3:55 PM, Serkan Mulayim <serkanmulayim@gmail.com>
> > wrote:
> >
> > > I am using the C library. I would like to get the suggester or
> > autocomplete
> > > functionality in my library. It needs to return {"hello", "hell",
> > "hellx"}
> > > when your query is "hell". I feel like I need to be able to read all
> the
> > > tokens in the whole index, and return the results based on it. I looked
> > at
> > > the indexReader for this, but I could not find any useful information.
> Do
> > > you think this is possible?
> >
> > Autosuggestion functionality will need tuning, just like search results.
> > In
> > fact, autosuggestion is really a specialized form of search application.
> > It
> > could be implemented with a separate index or separate fields.
> >
> > Say that we only wanted to offer suggestions derived from the `title`
> > field.
> > Split each title into an array of words.  Then for each word, index
> > starting
> > at some letter, say the third.  For the title `hello world`, you'd get
> the
> > following tokens:
> >
> >     hello -> hel hell hello
> >     world -> wor worl world
> >
> > Then at search time, perform a search query with every keystroke.
> >
> >     h -> (no result)
> >     he -> (no result)
> >     hel -> "hello world"
> >
> > Once you've got basic functionality running, experiment with minimum
> token
> > length, adding Soundex/Metaphone, performing character normalization,
> etc.
> >
> > Marvin Humphrey
> >
>



-- 
Peter Karman . https://peknet.com/ <http://peknet.com/> .
https://keybase.io/peterkarman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message