lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <>
Subject Questions
Date Thu, 29 Jan 2004 21:26:18 GMT

I am about neck deep in updating the TermVector code from Dmitry.  I believe I have most of
it in, with the exception of the SegmentMerge code.  Was wondering if anyone could write a
little bit on the concepts behind this code?  

Also, in the File Formats section (under limitations), it says the TermCount (the number of
terms that can be indexed) is currently a 32 bit, but the code is moving towards 64 bit. 
What part, if any, has been moved?  I was looking in SegmentTermEnum and the position value
in there is currently a long, but the only place it gets assigned to (other than where it
is incremented in next()) is assigning an int in the seek() method.
In TermInfosReader, there are some things that refer to position by longs, while others refer
by ints.

In Dmitry's code, he maps Terms to Term Numbers by using the position of the term, but this
really won't work when moving to 64 bit fields (since the term numbers are stored in an array,
which is only 32 bit addressable).  

Would it be acceptable to put the postion value back to being an int until we are ready to
address the complete issue of 64 bit storage as a whole?  Or am I missing something about
the usage of position?  Changing it back, I have a compilable version for 1.3, and in a  few
days, should have a tested version (I am also writing many new Unit tests) that I can submit
for review.

Any insight is appreciated.

Grant Ingersoll

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message