lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4491) Make analyzing suggester more flexible
Date Fri, 19 Oct 2012 18:45:11 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480239#comment-13480239
] 

Simon Willnauer commented on LUCENE-4491:
-----------------------------------------

ok so what this really boils down to is that I am working around the current api and its limitations.
Maybe we tackles this from the other direction and make our API better to make distinctions
between the different suggester impls. here are a couple of things I think we should tackle:
  * To begin with I think it is really hard to have a common interface for all our different
impls. We should flatten out the hierarchy and make dedicated / impl specific interfaces before
we abstract (this might not be possible)
  * Lookup is a really bad name lets get rid of this
  * all the methods that apply to mutable impls should go away
  * we should separate building the suggester and the "suggest" impl. Most impls are immutable
(FST ones) and they should not need to be pushed into a mutable interface.
  * Building should be impl specific ie. you should maybe even need to pass keys in order
and we can provide utils?
  * Building should be much simpler. the TermFreqIterator is bogus here. FST suggestors should
provide builders that have methods like FSTSuggestBuilder.put(BytesRef, long weight) that
we can overload like AnalyzingSuggestBuilder.put(BytesRef input, BytesRef output, weight)
<-- this would solve this issue btw.
  * all FST suggester impls should require the FST or an inputstream as ctor args to enforce
immutability.

let do this is in a sep issue and rip this all apart.
                
> Make analyzing suggester more flexible
> --------------------------------------
>
>                 Key: LUCENE-4491
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4491
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 4.1
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4491.patch, LUCENE-4491.patch
>
>
> Today we have a analyzing suggester that is bound to a single key. Yet, if you want to
have a totally different surface form compared to the key used to find the suggestion you
either have to copy the code or play some super ugly analyzer tricks. For example I want to
suggest "Barbar Streisand" if somebody types "strei" in that case the surface form is totally
different from the analyzed form. 
> Even one step further I want to embed some meta-data in the suggested key like a user
id or some type my surface form could look like "Barbar Streisand|15". Ideally I want to encode
this as binary and that might not be a valid UTF-8 byte sequence.
> I'm actually doing this in production and my only option was to copy the analyzing suggester
and some of it's related classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message