lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3206) FST package API refactoring
Date Fri, 17 Jun 2011 16:26:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051170#comment-13051170
] 

Michael McCandless commented on LUCENE-3206:
--------------------------------------------

bq. Thanks Mike. I agree it'd be nice to have a flexible label type as well, but I have no
idea how to make it efficient (and code-clean) yet. You could do a similar thing as with the
outputs (using either a boxed type if you don't care about performance that much or a mutable
wrapper if you do care about GC), but how this would affect the API I have no idea right now.
There is also the lexicographic order that one would need to consider (a comparator would
need to be passed as part of the construction process and then for traversals). It'll get
complicated.

Yeah this was my fear :)

bq. I was also thinking of just dropping support for BYTE1/2 and leaving fixed int labels...
This would bloat byte-labeled automata a little bit (if they're ASCII they'd v-code into a
single byte anyway), but would strip down the ugliness of BYTE1/2/4... All methods accepting
BytesRef and CharSequence would still be there, translated on the fly, but the representation
of labels would always be an int.

Hmm, that makes me nervous -- this could be a non-negligible increase
in FST size for the non-ascii case I think?

bq. One more question: can you give me traversal use cases you're using FSTs for now? I'll
try to implement them and see how the new API works out in practice. I looked at the FSTEnum
and it has next(), seekCeil() and seekFloor().

I think SimpleText codec is a good example?  Also
VariableGapTermsIndexReader, and MemoryCodec?  Each of these use the
BytesRefFSTEnum, I believe.

bq. I'm also a bit terrified by the about of changes this would introduce if we decided to
switch the APIs (tests, scattered use cases...). Don't know if I'll have the time to update
this all.

I think it's still fairly contained at this point?  (Ie the number of
tests that directly use the FST APIs).


> FST package API refactoring
> ---------------------------
>
>                 Key: LUCENE-3206
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3206
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.2
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time to fiddle
with it. I've been using the current API for some time and I do have some ideas for improvement.
This is a placeholder for these -- I'll post a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message