lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Numerical ids for terms?
Date Wed, 13 Apr 2011 13:42:31 GMT
On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote:
> Hi -- has there been any effort to create a numerical representation of Lucene 
> indices. That is, to use the Lucene Directory backend as a large term-document 
> matrix at index level. As this would require bijective mapping between terms 
> (per-field, as customary in Lucene) and a numerical index (integer, monotonous 
> from 0 to numTerms()-1), I guess this requires some some special modifications 
> to the Lucene core.

Maybe you're thinking about something like TermsEnum?
https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/TermsEnum.html
It provides ordinal-access to terms, represented with longs. In order to
make the access at index-level rather than segment-level you will have
to perform a merge of the ordinals from the different segments.

Unfortunately it is optional whether the codec supports ordinal-based
terms access and the default codec does not, so you will have to
explicitly select a codec when you build your index.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message