lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander vom Berg <m...@avomberg.de>
Subject Structure of .tii-file
Date Wed, 21 Jul 2010 08:52:16 GMT
Hello everybody,

I am reading the file format paper and I check it against a created index.
The documentation says:
TermInfoIndex (.tii)--> TIVersion, IndexTermCount, IndexInterval,
SkipInterval, MaxSkipLevels, TermIndices

If I look into the .tii-file I see the following:
TIVersion = FF FF FF FC  (4 Bytes)
IndexTermCount = 00 00 00 00 00 00 00 0C = 10  (8 Bytes)
IndexInterval = 00 00 00 80 = 128  (4 Bytes)
SkipInterval = 00 00 00 10 = 16  (4 Bytes)
MaxSkipLevels = 00 00 00 0A = 10 (4 Bytes)
TermIndices = ?????  (? Bytes)

I looked in two indexes and for both the following byte sequences are equal
(marked bold):
*00 00 FF FF FF FF 0F 00 00 00 18 00* (0B 61 or 0D30 .....)

Maybe I don't understand the Map with *<TermInfo, IndexDelta>^IndexTermCount
*. How should I calculate the correct byte length?
I assume the IndexDelta with VLong has 8 bytes if the leading bit is 0
(Similar vo VInt or is VLong somewhere described?). TermInfo is explained in
the .tis file section.

TermIndices   = <TermInfo, IndexDelta>

= <(Term,DocFreq,FreqDelta,ProxDelta,SkipDelta), IndexDelta>
= <([PrefixLength,Suffix,FieldNum],DocFreq,FreqDelta,ProxDelta,SkipDelta),
         IndexDelta>
= <([        00         ,  00     ,        FF  ],        FF   ,      FF
  ,      FF      ,      0F      ),   00 00 00 18 00 0B 61 6E>



IndexDelta is to large for my small index! Also DocFreq is to large because
I only have 16 documents in total. :(

Can somebody tell me how to read the bytes correctly from the file? I would
like to find the correct position in the .tis file from .tii data.

Best regards
Alex

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message