lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
Date Wed, 02 Feb 2011 15:06:28 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989662#comment-12989662
] 

Toke Eskildsen commented on LUCENE-2843:
----------------------------------------

I see that the VariableGapTermsIndexReader/Writer is now the default (or at least an experimental
default) in trunk. This means that ord() and consequently seek() are not available. Are you,
Michael, planning on adding these later on or are they gone for good?

If they are gone for good, it does represent a bit of a problem for me as I use ord() and
seek() for a memory-efficient hierarchical faceting system. Not having those in the default
reader/writer means that most indexes "out there" will not support accessing terms by ordinals
and that my code won't work on them unless they are re-build. Boo hoo for me, but not implementing
the interface fully in the default implementation seems wrong. Or maybe the interface should
be changed?

> Add variable-gap terms index impl.
> ----------------------------------
>
>                 Key: LUCENE-2843
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2843
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2843.patch, LUCENE-2843.patch
>
>
> PrefixCodedTermsReader/Writer (used by all "real" core codecs) already
> supports pluggable terms index impls.
> The only impl we have now is FixedGapTermsIndexReader/Writer, which
> picks every Nth (default 32) term and holds it in efficient packed
> int/byte arrays in RAM.  This is already an enormous improvement (RAM
> reduction, init time) over 3.x.
> This patch adds another impl, VariableGapTermsIndexReader/Writer,
> which lets you specify an arbitrary IndexTermSelector to pick which
> terms are indexed, and then uses an FST to hold the indexed terms.
> This is typically even more memory efficient than packed int/byte
> arrays, though, it does not support ord() so it's not quite a fair
> comparison.
> I had to relax the terms index plugin api for
> PrefixCodedTermsReader/Writer to not assume that the terms index impl
> supports ord.
> I also did some cleanup of the FST/FSTEnum APIs and impls, and broke
> out separate seekCeil and seekFloor in FSTEnum.  Eg we need seekFloor
> when the FST is used as a terms index but seekCeil when it's holding
> all terms in the index (ie which SimpleText uses FSTs for).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message