lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WolfgangTäger <wtae...@epo.org>
Subject Index File structure, in particular TermInfo
Date Wed, 08 Feb 2006 07:55:12 GMT
Hi,

I'm using Lucene 1.4.3 Java version.

In order to solve some particular problems, I'm trying to access the cfs 
file directly from outside the Java framework.
However reading the tis file turns out to be difficult:

I tried to follow
http://lucene.apache.org/java/docs/fileformats.html 

and successfully read the first entries, but then there was a problem. I 
then found in the source code (TermInfosWriter), that SkipDelta
is sometimes omitted. After fixing this problem, there apparently is still 
another problem occurring after several hundred entries.
It looks like ProxDelta is missing too in these cases.

However I didn't find this in the source.

Therefore my question is whether there are exceptions from the scheme 
given on the fileformats page:





1.      TermInfos --> <TermInfo>TermCount 
TermInfo --> <Term, DocFreq, FreqDelta, ProxDelta, SkipDelta> 
Term --> <PrefixLength, Suffix, FieldNum> 



Note: I'm reading tis, not tii at the moment, but maybe this is related.

Thanks,

Wolfgang
 

 
 
 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message