lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gsing...@apache.org
Subject svn commit: r807653 [3/3] - in /lucene/java/trunk: docs/fileformats.html docs/fileformats.pdf src/site/src/documentation/content/xdocs/fileformats.xml
Date Tue, 25 Aug 2009 14:36:48 GMT
Modified: lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml?rev=807653&r1=807652&r2=807653&view=diff
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml (original)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml Tue Aug 25
14:36:47 2009
@@ -12,7 +12,7 @@
 
             <p>
                 This document defines the index file formats used
-                in Lucene version 2.1. If you are using a different
+                in Lucene version 2.9. If you are using a different
                 version of Lucene, please consult the copy of
                 <code>docs/fileformats.html</code>
                 that was distributed
@@ -27,7 +27,7 @@
                 languages</a>.  If these versions are to remain compatible with Apache
                 Lucene, then a language-independent definition of the Lucene index
                 format is required.  This document thus attempts to provide a
-                complete and independent definition of the Apache Lucene 2.1 file
+                complete and independent definition of the Apache Lucene 2.9 file
                 formats.
             </p>
 
@@ -367,7 +367,7 @@
             </tr>
             <tr>
               <td><a href="#Normalization Factors">Norms</a></td>
-              <td>.nrm (pre 2.1: .f[0-9]*)</td>
+              <td>.nrm</td>
               <td>Encodes length and boost factors for docs and fields</td>
             </tr>
             <tr>
@@ -903,32 +903,8 @@
                     -2), followed by the generation recorded as Int64,
                     written twice.
                 </p>
-
-                <p>
-                    <b>Pre-2.1:</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName,
SegSize&gt;
-                    <sup>SegCount</sup>
-                </p>
-                <p>
-                    <b>2.1 and above:</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName,
SegSize, DelGen, HasSingleNormFile, NumField,
-                    NormGen<sup>NumField</sup>,
-                    IsCompoundFile&gt;<sup>SegCount</sup>
-                </p>
                 <p>
-                    <b>2.3:</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName,
SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile,
NumField,
-                    NormGen<sup>NumField</sup>,
-                    IsCompoundFile&gt;<sup>SegCount</sup>
-                </p>
-                <p>
-                    <b>2.4 and above:</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName,
SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile,
NumField,
-                    NormGen<sup>NumField</sup>,
-                    IsCompoundFile, DeletionCount, HasProx&gt;<sup>SegCount</sup>,
Checksum
-                </p>
-                <p>
-                    <b>2.9 and above:</b>
+                    <b>2.9</b>
                     Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName,
SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile,
NumField,
                     NormGen<sup>NumField</sup>,
                     IsCompoundFile, DeletionCount, HasProx, Diagnostics&gt;<sup>SegCount</sup>,
CommitUserData, Checksum
@@ -961,7 +937,7 @@
                 </p>
 
                 <p>
-                    Format is -1 as of Lucene 1.4, -3 (SegmentInfos.FORMAT_SINGLE_NORM_FILE)
as of Lucene 2.1 and 2.2, -4 (SegmentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3, -7 (SegmentInfos.FORMAT_HAS_PROX)
as of Lucene 2.4, and -9 (SegmentInfos.FORMAT_DIAGNOSTICS) as of Lucene 2.9.
+                    Format is -9 (SegmentInfos.FORMAT_DIAGNOSTICS).
                 </p>
 
                 <p>
@@ -1092,20 +1068,12 @@
                     documents).  This lock file ensures that only one
                     writer is modifying the index at a time.
                 </p>
-
-                <p>
-                    Note that prior to version 2.1, Lucene also used a
-                    commit lock. This was removed in 2.1.
-                </p>
-
             </section>
 
             <section id="Deletable File"><title>Deletable File</title>
 
                 <p>
-                    Prior to Lucene 2.1 there was a file "deletable"
-                    that contained details about files that need to be
-                    deleted. As of 2.1, a writer dynamically computes
+                    A writer dynamically computes
                     the files that are deletable, instead, so no file
                     is written.
                 </p>
@@ -1193,9 +1161,6 @@
                             bit is one for fields that have term vectors stored, and zero
for fields
                             without term vectors.
                         </li>
-                        <p>
-                            <b>Lucene &gt;= 1.9:</b>
-                        </p>
                         <li>If the third lowest-order bit is set (0x04), term positions
are stored with the term vectors.</li>
                         <li>If the fourth lowest-order bit is set (0x08), term offsets
are stored with the term vectors.</li>
                         <li>If the fifth lowest-order bit is set (0x10), norms are
omitted for the indexed field.</li>
@@ -1286,22 +1251,6 @@
                         <p>FieldNum --&gt;
                             VInt
                         </p>
-
-                        <p>
-                            <b>Lucene &lt;= 1.4:</b>
-                        </p>
-                        <p>Bits --&gt;
-                            Byte
-                        </p>
-                        <p>Value --&gt;
-                            String
-                        </p>
-                        <p>Only the low-order bit of Bits is used. It is one for
-                            tokenized fields, and zero for non-tokenized fields.
-                        </p>
-                        <p>
-                            <b>Lucene &gt;= 1.9:</b>
-                        </p>
                         <p>Bits --&gt;
                             Byte
                         </p>
@@ -1383,7 +1332,7 @@
                             UTF16 character code) by the term's text.
                         </p>
                         <p>TIVersion names the version of the format
-                            of this file and is -2 in Lucene 1.4.
+                            of this file and is equal to TermInfosWriter.FORMAT_CURRENT.
                         </p>
                         <p>Term
                             text prefixes are shared. The PrefixLength is the number of initial
@@ -1592,7 +1541,7 @@
                     <sup>nd</sup>
                     starts.
                 </p>
-                <p>Lucene 2.2 introduces the notion of skip levels. Each term can have
multiple skip levels.
+                <p>Each term can have multiple skip levels.
                    The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels,
floor(log(DocFreq/log(SkipInterval)))).
                    The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level
+ 1)), whereas the lowest skip
                    level is Level=0. <br></br>
@@ -1674,20 +1623,8 @@
                 </p>
             </section>
             <section id="Normalization Factors"><title>Normalization Factors</title>
-				<p>
-                    <b>Pre-2.1:</b>
-                    There's a norm file for each indexed field with a byte for
-                    each document. The .f[0-9]* file contains,
-                    for each document, a byte that encodes a value that is multiplied
-                    into the score for hits on that field:
-                </p>
-                <p>Norms
-                    (.f[0-9]*) --&gt; &lt;Byte&gt;
-                    <sup>SegSize</sup>
-                </p>
-				<p>
-                    <b>2.1 and above:</b>
-                    There's a single .nrm file containing all norms:
+
+				        <p>There's a single .nrm file containing all norms:
                 </p>
                 <p>AllNorms
                     (.nrm) --&gt; NormsHeader,&lt;Norms&gt;
@@ -1745,13 +1682,7 @@
 					When field <em>N</em> is modified, a separate norm file <em>.sN</em>

 					is created, to maintain the norm values for that field.
                 </p>
-				<p>
-                    <b>Pre-2.1:</b>
-                    Separate norm files are created only for compound segments.
-                </p>
-				<p>
-                    <b>2.1 and above:</b>
-                    Separate norm files are created (when adequate) for both compound and
non compound segments.
+				<p>Separate norm files are created (when adequate) for both compound and non compound
segments.
                 </p>
 
             </section>
@@ -1770,7 +1701,7 @@
                         <p>DocumentIndex (.tvx) --&gt; TVXVersion&lt;DocumentPosition,FieldPosition&gt;
                             <sup>NumDocs</sup>
                         </p>
-                        <p>TVXVersion --&gt; Int (3 (TermVectorsReader.FORMAT_VERSION2)
for Lucene 2.4)</p>
+                        <p>TVXVersion --&gt; Int (TermVectorsReader.CURRENT)</p>
                         <p>DocumentPosition --&gt; UInt64 (offset in
                         the .tvd file)</p>
                         <p>FieldPosition --&gt; UInt64 (offset in the
@@ -1785,7 +1716,7 @@
                             Document (.tvd) --&gt; TVDVersion&lt;NumFields, FieldNums,
FieldPositions&gt;
                             <sup>NumDocs</sup>
                         </p>
-                        <p>TVDVersion --&gt; Int (3 (TermVectorsReader.FORMAT_VERSION2)
for Lucene 2.4)</p>
+                        <p>TVDVersion --&gt; Int (TermVectorsReader.FORMAT_CURRENT)</p>
                         <p>NumFields --&gt; VInt</p>
                         <p>FieldNums --&gt; &lt;FieldNumDelta&gt;
                             <sup>NumFields</sup>
@@ -1805,7 +1736,7 @@
                         <p>Field (.tvf) --&gt; TVFVersion&lt;NumTerms, Position/Offset,
TermFreqs&gt;
                             <sup>NumFields</sup>
                         </p>
-                        <p>TVFVersion --&gt; Int (3 (TermVectorsReader.FORMAT_VERSION2)
for Lucene 2.4)</p>
+                        <p>TVFVersion --&gt; Int (TermVectorsReader.FORMAT_CURRENT)</p>
                         <p>NumTerms --&gt; VInt</p>
                         <p>Position/Offset --&gt; Byte</p>
                         <p>TermFreqs --&gt; &lt;TermText, TermFreq, Positions?,
Offsets?&gt;
@@ -1845,15 +1776,7 @@
 
                 <p>Although per-segment, this file is maintained exterior to compound
segment files.
                 </p>
-				
-                <p>
-                <b>Pre-2.1:</b>
-                Deletions
-                    (.del) --&gt; ByteCount,BitCount,Bits
-                </p>
-
                 <p>
-				<b>2.1 and above:</b>
                 Deletions
                     (.del) --&gt; [Format],ByteCount,BitCount, Bits | DGaps (depending
on Format)
                 </p>



Mime
View raw message