lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregor Heinrich <>
Subject Re: Numerical ids for terms?
Date Wed, 13 Apr 2011 15:46:10 GMT
Thanks Toke and Kirill -- I guess that's the way to go (at least until v4.0).

Best regards


On 4/13/11 3:42 PM, Toke Eskildsen wrote:
> On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote:
>> Hi -- has there been any effort to create a numerical representation of Lucene
>> indices. That is, to use the Lucene Directory backend as a large term-document
>> matrix at index level. As this would require bijective mapping between terms
>> (per-field, as customary in Lucene) and a numerical index (integer, monotonous
>> from 0 to numTerms()-1), I guess this requires some some special modifications
>> to the Lucene core.
> Maybe you're thinking about something like TermsEnum?
> It provides ordinal-access to terms, represented with longs. In order to
> make the access at index-level rather than segment-level you will have
> to perform a merge of the ordinals from the different segments.
> Unfortunately it is optional whether the codec supports ordinal-based
> terms access and the default codec does not, so you will have to
> explicitly select a codec when you build your index.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message