lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gsing...@apache.org
Subject svn commit: r479465 [2/4] - in /lucene/java/trunk: docs/ docs/images/ docs/lucene-sandbox/ docs/styles/ src/site/ src/site/src/ src/site/src/documentation/ src/site/src/documentation/classes/ src/site/src/documentation/conf/ src/site/src/documentation/...
Date Mon, 27 Nov 2006 00:00:49 GMT
Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo4.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo4.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo4.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo4.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,160 @@
+<?xml version="1.0"?>
+<document>
+	<header>
+        <title>
+	Apache Lucene - Basic Demo Sources Walkthrough
+		</title>
+	</header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About the Code"><title>About the Code</title>
+<p>
+In this section we walk through the sources behind the basic Lucene Web Application demo: where to
+find them, their parts and their function.  This section is intended for Java developers wishing to
+understand how to use Lucene in their applications or for those involved in deploying web
+applications based on Lucene.
+</p>
+</section>
+
+
+<section id="Location of the source (developers/deployers)"><title>Location of the source (developers/deployers)</title>
+<p>
+Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
+should see a directory called <code>src</code> which in turn contains a directory called
+<code>jsp</code>.  This is the root for all of the Lucene web demo.
+</p>
+<p>
+Within this directory you should see <code>index.jsp</code>.  Bring this up in vi or your editor of
+choice.
+</p>
+</section>
+
+<section id="index.jsp (developers/deployers)"><title>index.jsp (developers/deployers)</title>
+<p>
+This jsp page is pretty boring by itself.  All it does is include a header, display a form and
+include a footer.  If you look at the form, it has two fields: <code>query</code> (where you enter
+your search criteria) and <code>maxresults</code> where you specify the number of results per page.
+By the structure of this JSP it should be easy to customize it without even editing this particular
+file.  You could simply change the header and footer.  Let's look at the <code>header.jsp</code>
+(located in the same directory) next.
+</p>
+</section>
+
+<section id="header.jsp (developers/deployers)"><title>header.jsp (developers/deployers)</title>
+<p>
+The header is also very simple by itself.  The only thing it does is include the
+<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
+title and a brief header.  This would be a good place to put your own custom HTML to "pretty" things
+up a bit.  We won't cover the footer because all it does is display the footer and close your tags.
+Let's look at the <code>results.jsp</code>, the meat of this application, next.
+</p>
+</section>
+
+<section id="results.jsp (developers)"><title>results.jsp (developers)</title>
+<p>
+Most of the functionality lies in <code>results.jsp</code>.  Much of it is for paging the search
+results, which we'll not cover here as it's commented well enough.  The first thing in this page is
+the actual imports for the Lucene classes and Lucene demo classes.  These classes are loaded from
+the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
+</p>
+<p>
+You'll notice that this file includes the same header and footer as <code>index.jsp</code>.  From
+there it constructs an <code><a
+href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
+<code>indexLocation</code> that was specified in <code>configuration.jsp</code>.  If there is an
+error of any kind in opening the index, it is displayed to the user and the boolean flag
+<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
+</p>
+<p>
+From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
+maximum number of results per page.  If the maximum results per page is not set or not valid then it
+and the start index are set to default values.  If only the start index is invalid it is set to a
+default value.  If the criteria isn't provided then a servlet error is thrown (it is assumed that
+this is the result of url tampering or some form of browser malfunction).
+</p>
+<p>
+The jsp moves on to construct a <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
+analyze the search text.  This matches the analyzer used during indexing (<code><a
+href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
+recommended.  This is passed to the <code><a
+href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
+criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
+object.  You'll also notice the string literal <code>"contents"</code> included.  This specifies
+that the search should cover the <code>contents</code> field and not the <code>title</code>,
+<code>url</code> or some other field in the indexed documents.  If there is any error in
+constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
+error is displayed to the user.
+</p>
+<p>
+In the next section of the jsp the <code><a
+href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
+given the query object.  The results are returned in a collection called <code>hits</code>.  If the
+length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
+error is displayed to the user and the error flag is set.
+</p>
+<p>
+Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
+account, and displays properties of the <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
+the first walkthrough.  These objects contain "known" fields specific to their indexer (in this case
+<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
+with "url", "title" and "contents").
+</p>
+<p>
+Please note that in a real deployment of Lucene, it's best to instantiate <code><a
+href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a
+href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
+share them across search requests, instead of re-instantiating per search request.
+</p>
+</section>
+
+<section id="More sources (developers)"><title>More sources (developers)</title>
+<p>
+There are additional sources used by the web app that were not specifically covered by either
+walkthrough.  For example the HTML parser, the <code><a
+href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a
+href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class.  These are very
+similar to the classes covered in the first example, with properties specific to parsing and
+indexing HTML.  This is beyond our scope; however, by now you should feel like you're "getting
+started" with Lucene.
+</p>
+</section>
+
+<section id="Where to go from here? (everyone!)"><title>Where to go from here? (everyone!)</title>
+<p>
+There are a number of things this demo doesn't do or doesn't do quite right.  For instance, you may
+have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
+support that context or redirect to it), anywhere where the directory doesn't quite match the
+context mapping, you'll have a broken link in your results.  If you want to index non-local files or
+have some other needs this isn't supported, plus there may be security issues with running the
+indexing application from your webapps directory.  There are a number of things left for you the
+developer to do.
+</p>
+<p>
+In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
+to hear it!), but for now: this is where you begin and the search engine/indexer ends.  Lastly, one
+would assume you'd want to follow the above advice and customize the application to look a little
+more fancy than black on white with "Lucene Template" at the top.  We'll see you on the Lucene
+Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
+</p>
+</section>
+
+<section id="When to contact the Author"><title>When to contact the Author</title>
+<p>
+Please resist the urge to contact the authors of this document (without bribes of fame and fortune
+attached).  First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a
+href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
+Certainly you'll get the most help that way as well.  That being said, feedback, and modifications
+to this document and samples are ever so greatly appreciated.  They are just best sent to the lists
+or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
+everyone can share in them.  Thanks for understanding!
+</p>
+</section>
+
+</body>
+</document>
+

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/features.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/features.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/features.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/features.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,47 @@
+<?xml version="1.0"?>
+<document>
+<header>
+<title>Apache Lucene - Features</title>
+</header>
+<body>
+
+<section id="Features"><title>Features</title>
+<p>Lucene offers powerful features through a simple API:</p>
+</section>
+
+<section id="Scalable, High-Performance Indexing"><title>Scalable, High-Performance Indexing</title>
+<ul>
+<li>over 20MB/minute on Pentium M 1.5GHz<br/></li>
+<li>small RAM requirements -- only 1MB heap</li>
+<li>incremental indexing as fast as batch indexing</li>
+<li>index size roughly 20-30% the size of text indexed</li>
+</ul>
+</section>
+
+<section id="Powerful, Accurate and Efficient Search Algorithms"><title>Powerful, Accurate and Efficient Search Algorithms</title>
+<ul>
+<li>ranked searching -- best results returned first</li>
+<li>many powerful query types: phrase queries, wildcard queries, proximity
+	queries, range queries and more</li>
+<li>fielded searching (e.g., title, author, contents)</li>
+<li>date-range searching</li>
+<li>sorting by any field</li>
+<li>multiple-index searching with merged results</li>
+<li>allows simultaneous update and searching</li>
+</ul>
+</section>
+
+<section id="Cross-Platform Solution"><title>Cross-Platform Solution</title>
+<ul>
+<li>Available as Open Source software under the
+	<a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache License</a>
+	which lets you use Lucene in both commercial and Open Source programs</li>
+<li>100%-pure Java</li>
+<li>Implementations <a href="http://wiki.apache.org/jakarta-lucene/LuceneImplementations">in other
+	programming languages available</a> that are index-compatible</li>
+</ul>
+</section>
+
+</body>
+</document>
+

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,1377 @@
+<?xml version="1.0"?>
+
+<document>
+	<header>
+        <title>
+Apache Lucene - Index File Formats
+		</title>
+	</header>
+  <properties>
+   
+   <authors>
+    <person email="cutting@apache.org" name="Doug Cutting"/>
+   </authors>
+  </properties>
+
+    <body>
+        <section id="Index File Formats">
+            <title>Index File Formats</title>
+            <p>
+                This document defines the index file formats used
+                in Lucene version 2.0.  If you are using a different
+		version of Lucene, please consult the copy of
+		<code>docs/fileformats.html</code> that was distributed
+		with the version you are using.
+            </p>
+
+            <p>
+                Apache Lucene is written in Java, but several
+                efforts are underway to write
+                <a href="http://wiki.apache.org/jakarta-lucene/LuceneImplementations">versions
+                of Lucene in other programming
+                languages</a>.  If these versions are to remain compatible with Apache
+                Lucene, then a language-independent definition of the Lucene index
+                format is required.  This document thus attempts to provide a
+                complete and independent definition of the Apache Lucene 1.4 file
+                formats.
+            </p>
+
+            <p>
+                As Lucene evolves, this document should evolve.
+                Versions of Lucene in different programming languages should endeavor
+                to agree on file formats, and generate new versions of this document.
+            </p>
+
+            <p>
+                Compatibility notes are provided in this document,
+                describing how file formats have changed from prior versions.
+            </p>
+
+        </section>
+
+        <section id="Definitions">
+            <title>Definitions</title>
+            <p>
+                The fundamental concepts in Lucene are index,
+                document, field and term.
+            </p>
+
+
+            <p>
+                An index contains a sequence of documents.
+            </p>
+
+            <ul>
+                <li>
+                    <p>
+                        A document is a sequence of fields.
+                    </p>
+                </li>
+
+                <li>
+                    <p>
+                        A field is a named sequence of terms.
+                    </p>
+                </li>
+
+                <li>
+                    A term is a string.
+                </li>
+            </ul>
+
+            <p>
+                The same string in two different fields is
+                considered a different term.  Thus terms are represented as a pair of
+                strings, the first naming the field, and the second naming text
+                within the field.
+            </p>
+
+            <section id="Inverted Indexing">
+                <title>Inverted Indexing</title>
+                <p>
+                    The index stores statistics about terms in order
+                    to make term-based search more efficient.  Lucene's
+                    index falls into the family of indexes known as an <i>inverted
+                        index.</i> This is because it can list, for a term, the documents that contain
+                    it.  This is the inverse of the natural relationship, in which
+                    documents list terms.
+                </p>
+            </section>
+            <section id="Types of Fields">
+                <title>Types of Fields</title>
+                <p>
+                    In Lucene, fields may be <i>stored</i>, in which
+                    case their text is stored in the index literally, in a non-inverted
+                    manner.  Fields that are inverted are called <i>indexed</i>. A field
+                    may be both stored and indexed.</p>
+
+                <p>The text of a field may be <i>tokenized</i> into terms to be
+                    indexed, or the text of a field may be used literally as a term to be indexed.
+                    Most fields are
+                    tokenized, but sometimes it is useful for certain identifier fields
+                    to be indexed literally.
+                </p>
+                <p>See the <a href="http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html">Field</a> java docs for more information on Fields.</p>
+            </section>
+
+            <section id="Segments">
+                <title>Segments</title>
+                <p>
+                    Lucene indexes may be composed of multiple sub-indexes, or<i>
+                        segments</i>. Each segment is a fully independent index, which could be searched
+                    separately. Indexes evolve by:
+                </p>
+
+                <ol>
+                    <li><p>Creating new segments for newly added documents.</p>
+                    </li>
+                    <li><p>Merging existing segments.</p>
+                    </li>
+                </ol>
+
+                <p>
+                    Searches may involve multiple segments and/or multiple indexes, each
+                    index potentially composed of a set of segments.
+                </p>
+            </section>
+
+            <section id="Document Numbers">
+                <title>Document Numbers</title>
+                <p>
+                    Internally, Lucene refers to documents by an integer <i>document
+                        number</i>. The first document added to an index is numbered zero, and each
+                    subsequent document added gets a number one greater than the previous.
+                </p>
+
+                <p>
+                    <br/>
+                </p>
+
+                <p>
+                    Note that a document's number may change, so caution should be taken
+                    when storing these numbers outside of Lucene.  In particular, numbers may
+                    change in the following situations:
+                </p>
+
+
+                <ul>
+                    <li>
+                        <p>
+                            The
+                            numbers stored in each segment are unique only within the segment,
+                            and must be converted before they can be used in a larger context.
+                            The standard technique is to allocate each segment a range of
+                            values, based on the range of numbers used in that segment.  To
+                            convert a document number from a segment to an external value, the
+                            segment's <i>base</i> document
+                            number is added.  To convert an external value back to a
+                            segment-specific value, the  segment is identified by the range that
+                            the external value is in, and the segment's base value is
+                            subtracted.  For example two five document segments might be
+                            combined, so that the first segment has a base value of zero, and
+                            the second of five.  Document three from the second segment would
+                            have an external value of eight.
+                        </p>
+                    </li>
+                    <li>
+                        <p>
+                            When documents are deleted, gaps are created
+                            in the numbering.  These are eventually removed as the index evolves
+                            through merging.  Deleted documents are dropped when segments are
+                            merged.  A freshly-merged segment thus has no gaps in its numbering.
+                        </p>
+                    </li>
+                </ul>
+
+            </section>
+
+        </section>
+
+        <section id="Overview">
+            <title>Overview</title>
+            <p>
+                Each segment index maintains the following:
+            </p>
+            <ul>
+                <li><p>Field names.  This
+                        contains the set of field names used in the index.
+
+                    </p>
+                </li>
+                <li><p>Stored Field
+                        values.  This contains, for each document, a list of attribute-value
+                        pairs, where the attributes are field names.  These are used to
+                        store auxiliary information about the document, such as its title,
+                        url, or an identifier to access a
+                        database. The set of stored fields are what is returned for each hit
+                        when searching.  This is keyed by document number.
+                    </p>
+                </li>
+                <li><p>Term dictionary.
+                        A dictionary containing all of the terms used in all of the indexed
+                        fields of all of the documents.  The dictionary also contains the
+                        number of documents which contain the term, and pointers to the
+                        term's frequency and proximity data.
+                    </p>
+                </li>
+
+                <li><p>Term Frequency
+                        data.  For each term in the dictionary, the numbers of all the
+                        documents that contain that term, and the frequency of the term in
+                        that document.
+                    </p>
+                </li>
+
+                <li><p>Term Proximity
+                        data.  For each term in the dictionary, the positions that the term
+                        occurs in each document.
+                    </p>
+                </li>
+
+                <li><p>Normalization
+                        factors.  For each field in each document, a value is stored that is
+                        multiplied into the score for hits on that field.
+                    </p>
+                </li>
+                <li><p>Term Vectors.  For each field in each document, the term vector
+                       (sometimes called document vector) may be stored.  A term vector consists
+                       of term text and term frequency.  To add Term Vectors to your index see the
+                    <a href="http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html">Field</a> constructors
+                    </p>
+                </li>              
+                <li><p>Deleted documents.
+                        An optional file indicating which documents are deleted.
+                    </p>
+                </li>
+            </ul>
+
+            <p>Details on each of these are provided in subsequent sections.
+            </p>
+        </section>
+
+        <section id="File Naming">
+            <title>File Naming</title>
+            <p>
+                All files belonging to a segment have the same name with varying
+                extensions.  The extensions correspond to the different file formats
+                described below. When using the Compound File format (default in 1.4 and greater) these files are
+                collapsed into a single .cfs file (see below for details)
+            </p>
+
+            <p>
+                Typically, all segments
+                in an index are stored in a single directory, although this is not
+                required.
+            </p>
+
+        </section>
+
+        <section id="Primitive Types">
+            <title>Primitive Types</title>
+            <section id="Byte">
+                <title>Byte</title>
+                <p>
+                    The most primitive type
+                    is an eight-bit byte.  Files are accessed as sequences of bytes.  All
+                    other data types are defined as sequences
+                    of bytes, so file formats are byte-order independent.
+                </p>
+
+            </section>
+
+            <section id="UInt32">
+                <title>UInt32</title>
+                <p>
+                    32-bit unsigned integers are written as four
+                    bytes, high-order bytes first.
+                </p>
+                <p>
+                    UInt32    --&gt; &lt;Byte&gt;<sup>4</sup>
+                </p>
+
+            </section>
+
+            <section id="Uint64">
+                <title>Uint64</title>
+                <p>
+                    64-bit unsigned integers are written as eight
+                    bytes, high-order bytes first.
+                </p>
+
+                <p>UInt64    --&gt; &lt;Byte&gt;<sup>8</sup>
+                </p>
+
+            </section>
+
+            <section id="VInt">
+                <title>VInt</title>
+                <p>
+                    A variable-length format for positive integers is
+                    defined where the high-order bit of each byte indicates whether more
+                    bytes remain to be read.  The low-order seven bits are appended as
+                    increasingly more significant bits in the resulting integer value.
+                    Thus values from zero to 127 may be stored in a single byte, values
+                    from 128 to 16,383 may be stored in two bytes, and so on.
+                </p>
+
+                <p><b>VInt Encoding Example</b></p>
+
+                <table width="100%" border="0" cellpadding="4" cellspacing="0">
+                    <col width="64*" />
+                    <col width="64*" />
+                    <col width="64*" />
+                    <col width="64*" />
+                    <tr valign="TOP">
+                        <td width="25%">
+                            <p align="RIGHT"><b>Value</b>
+                            </p>
+                        </td>
+                        <td width="25%">
+                            <p align="RIGHT"><b>First byte</b>
+                            </p>
+                        </td>
+                        <td width="25%">
+                            <p align="RIGHT"><b>Second byte</b>
+                            </p>
+                        </td>
+                        <td width="25%">
+                            <p align="RIGHT"><b>Third byte</b>
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="0" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">0
+                            </p>
+                        </td>
+                        <td width="25%" sdval="0" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                00000000
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="1" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">1
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                00000001
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="2" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">2
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                00000010
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr>
+                        <td width="25%" valign="TOP">
+                            <p align="RIGHT">...
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: 0.11cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="127" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">127
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1111111" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                01111111
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="128" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">128
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                10000000
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                00000001
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="129" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">129
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000001" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                10000001
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                00000001
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="130" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">130
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000010" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                10000010
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                00000001
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr>
+                        <td width="25%" valign="TOP">
+                            <p align="RIGHT">...
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: 0.11cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="16383" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">16,383
+                            </p>
+                        </td>
+                        <td width="25%" sdval="11111111" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                11111111
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1111111" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                01111111
+                            </p>
+                        </td>
+                        <td width="25%" sdnum="1033;0;00000000">
+                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
+                               0.01cm"><br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="16384" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">16,384
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                10000000
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                10000000
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.47cm;
+                               margin-right: 0.01cm">
+                                00000001
+                            </p>
+                        </td>
+                    </tr>
+                    <tr valign="BOTTOM">
+                        <td width="25%" sdval="16385" sdnum="1033;0;#,##0">
+                            <p align="RIGHT">16,385
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000001" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                10000001
+                            </p>
+                        </td>
+                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                10000000
+                            </p>
+                        </td>
+                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.47cm;
+                               margin-right: 0.01cm">
+                                00000001
+                            </p>
+                        </td>
+                    </tr>
+                    <tr>
+                        <td width="25%" valign="TOP">
+                            <p align="RIGHT">...
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
+                               margin-right: 0.01cm">
+                                <br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
+                               margin-right: 0.01cm">
+                                <br/>
+
+                            </p>
+                        </td>
+                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
+                            <p class="western" align="RIGHT" style="margin-left: -0.47cm;
+                               margin-right: 0.01cm">
+                                <br/>
+
+                            </p>
+                        </td>
+                    </tr>
+                </table>
+
+                <p>
+                    This provides compression while still being
+                    efficient to decode.
+                </p>
+
+            </section>
+
+            <section id="Chars">
+                <title>Chars</title>
+                <p>
+                    Lucene writes unicode
+                    character sequences using Java's
+                    <a href="http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8">"modified
+                    UTF-8 encoding"</a>.
+                </p>
+
+
+            </section>
+
+            <section id="String">
+                <title>String</title>
+                <p>
+                    Lucene writes strings as a VInt representing the length, followed by
+                    the character data.
+                </p>
+
+                <p>
+                    String --&gt; VInt, Chars
+                </p>
+
+            </section>
+
+        </section>
+
+        <section id="Per-Index Files">
+            <title>Per-Index Files</title>
+            <p>
+                The files in this section exist one-per-index.
+            </p>
+
+            <section id="Segments File">
+                <title>Segments File</title>
+                <p>
+                    The active segments in the index are stored in the
+                    segment info file.  An index only has
+                    a single file in this format, and it is named "segments".
+                    This lists each segment by name, and also contains the size of each
+                    segment.
+                </p>
+
+                <p>
+                    Segments    --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize&gt;<sup>SegCount</sup>
+                </p>
+
+                <p>
+                    Format, NameCounter, SegCount, SegSize    --&gt; UInt32
+                </p>
+
+                <p>
+                    Version --&gt; UInt64
+                </p>
+
+                <p>
+                    SegName    --&gt; String
+                </p>
+
+                <p>
+                    Format is -1 in Lucene 1.4.
+                </p>
+
+                <p>
+                    Version counts how often the index has been
+                    changed by adding or deleting documents.
+                </p>
+
+                <p>
+                    NameCounter is used to generate names for new segment files.
+                </p>
+
+                <p>
+                    SegName is the name of the segment, and is used as the file name prefix
+                    for all of the files that compose the segment's index.
+                </p>
+
+                <p>
+                    SegSize is the number of documents contained in the segment index.
+                </p>
+
+
+            </section>
+
+            <section id="Lock Files">
+                <title>Lock Files</title>
+                <p>
+                    Several files are used to indicate that another
+                    process is using an index.  Note that these files are not
+                    stored in the index directory itself, but rather in the
+                    system's temporary directory, as indicated in the Java
+                    system property "java.io.tmpdir".
+                </p>
+
+                <ul>
+                    <li>
+                        <p>
+                            When a file named "commit.lock"
+                            is present, a process is currently re-writing the "segments"
+                            file and deleting outdated segment index files, or a process is
+                            reading the "segments"
+                            file and opening the files of the segments it names.  This lock file
+                            prevents files from being deleted by another process after a process
+                            has read the "segments"
+                            file but before it has managed to open all of the files of the
+                            segments named therein.
+                        </p>
+                    </li>
+
+                    <li>
+                        <p>
+                            When a file named "write.lock"
+                            is present, a process is currently adding documents to an index, or
+                            removing files from that index.  This lock file prevents several
+                            processes from attempting to modify an index at the same time.
+                        </p>
+                    </li>
+                </ul>
+            </section>
+
+            <section id="Deletable File">
+                <title>Deletable File</title>
+                <p>
+                    A file named "deletable"
+                    contains the names of files that are no longer used by the index, but
+                    which could not be deleted.  This is only used on Win32, where a
+                    file may not be deleted while it is still open. On other platforms
+                    the file contains only null bytes.
+                </p>
+
+                <p>
+                    Deletable    --&gt; DeletableCount,
+                    &lt;DelableName&gt;<sup>DeletableCount</sup>
+                </p>
+
+                <p>DeletableCount    --&gt; UInt32
+                </p>
+                <p>DeletableName    --&gt;
+                    String
+                </p>
+            </section>
+
+            <section id="Compound Files">
+                <title>Compound Files</title>
+            	<p>Starting with Lucene 1.4 the compound file format became default. This
+            	is simply a container for all files described in the next section.</p>
+            	
+            	<p>Compound (.cfs) --&gt; FileCount, &lt;DataOffset, FileName&gt;<sup>FileCount</sup>,
+            		FileData<sup>FileCount</sup></p>
+            	
+            	<p>FileCount --&gt; VInt</p>
+            	
+            	<p>DataOffset --&gt; Long</p>
+
+            	<p>FileName --&gt; String</p>
+
+            	<p>FileData --&gt; raw file data</p>
+                <p>The raw file data is the data from the individual files named above.</p>
+            	
+            </section>
+
+        </section>
+
+        <section id="Per-Segment Files">
+            <title>Per-Segment Files</title>
+            <p>
+                The remaining files are all per-segment, and are
+                thus defined by suffix.
+            </p>
+            <section id="Fields">
+                <title>Fields</title>
+                <p><br/><b>Field Info</b><br/></p>
+
+                <p>
+                    Field names are
+                    stored in the field info file, with suffix .fnm.
+                </p>
+                <p>
+                    FieldInfos
+                    (.fnm)    --&gt; FieldsCount, &lt;FieldName,
+                    FieldBits&gt;<sup>FieldsCount</sup>
+                </p>
+
+                <p>
+                    FieldsCount    --&gt; VInt
+                </p>
+
+                <p>
+                    FieldName    --&gt; String
+                </p>
+
+                <p>
+                    FieldBits    --&gt; Byte
+                </p>
+
+                <p>
+	          <ul>
+                    <li>
+                    The low-order bit is one for
+		    indexed fields, and zero for non-indexed fields.
+                    </li>
+		    <li>
+		    The second lowest-order
+                    bit is one for fields that have term vectors stored, and zero for fields
+                    without term vectors.  
+	            </li>
+                        <p><b>Lucene &gt;= 1.9:</b></p>
+		    <li> If the third lowest-order bit is set (0x04), term positions are stored with the term vectors. </li>
+		    <li> If the fourth lowest-order bit is set (0x08), term offsets are stored with the term vectors. </li>
+		    <li> If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field. </li>
+		  </ul>
+                </p>
+
+                <p>
+                    Fields are numbered by their order in this file.  Thus field zero is
+                    the
+                    first field in the file, field one the next, and so on.  Note that,
+                    like document numbers, field numbers are segment relative.
+                </p>
+
+                <p><br/><b>Stored Fields</b><br/></p>
+
+                <p>
+                    Stored fields are represented by two files:
+                </p>
+
+                <ol>
+                    <li>
+                        <p>
+                            The field index, or .fdx file.
+                        </p>
+
+                        <p>
+                            This contains, for each document, a pointer to
+                            its field data, as follows:
+                        </p>
+
+                        <p>
+                            FieldIndex
+                            (.fdx)    --&gt;
+                            &lt;FieldValuesPosition&gt;<sup>SegSize</sup>
+                        </p>
+                        <p>FieldValuesPosition
+                            --&gt; Uint64
+                        </p>
+                        <p>This
+                            is used to find the location within the field data file of the
+                            fields of a particular document.  Because it contains fixed-length
+                            data, this file may be easily randomly accessed.  The position of
+                            document<i> n</i>'s<i> </i>field data is the Uint64 at <i>n*8</i> in
+                            this file.
+                        </p>
+                    </li>
+                    <li>
+                        <p>
+                            The field data, or .fdt file.
+
+                        </p>
+
+                        <p>
+                            This contains the stored fields of each document,
+                            as follows:
+                        </p>
+
+                        <p>
+                            FieldData (.fdt)    --&gt;
+                            &lt;DocFieldData&gt;<sup>SegSize</sup>
+                        </p>
+                        <p>DocFieldData    --&gt;
+                            FieldCount, &lt;FieldNum, Bits, Value&gt;<sup>FieldCount</sup>
+                        </p>
+                        <p>FieldCount  --&gt;
+                            VInt
+                        </p>
+                        <p>FieldNum    --&gt;
+                            VInt
+                        </p>
+                        
+                        <p><b>Lucene &lt;= 1.4:</b></p>
+                        <p>Bits        --&gt;
+                            Byte
+                        </p>
+                        <p>Value        --&gt;
+                            String
+                        </p>
+                        <p>Only the low-order bit of Bits is used.  It is one for
+                            tokenized fields, and zero for non-tokenized fields.
+                        </p>
+                        <p><b>Lucene &gt;= 1.9:</b></p>
+                        <p>Bits        --&gt;
+                            Byte
+                        </p>
+                        <p>
+                        <ul>
+                        	<li>low order bit is one for tokenized fields</li>
+                        	<li>second bit is one for fields containing binary data</li>
+                        	<li>third bit is one for fields with compression option enabled
+                        		(if compression is enabled, the algorithm used is ZLIB)</li>
+                        </ul>
+                        </p>
+                        <p>Value        --&gt;
+                            String | BinaryValue (depending on Bits)
+                        </p>
+                        <p>BinaryValue        --&gt;
+                            ValueSize, &lt;Byte&gt;^ValueSize
+                        </p>
+                        <p>ValueSize        --&gt;
+                            VInt
+                        </p>
+
+                    </li>
+                </ol>
+
+            </section>
+            <section id="Term Dictionary">
+                <title>Term Dictionary</title>
+                <p>
+                    The term dictionary is represented as two files:
+                </p>
+                <ol>
+                    <li>
+                        <p>
+                            The term infos, or tis file.
+                        </p>
+
+                        <p>
+                            TermInfoFile (.tis)--&gt;
+                            TIVersion, TermCount, IndexInterval, SkipInterval, TermInfos
+                        </p>
+                        <p>TIVersion    --&gt;
+                            UInt32
+                        </p>
+                        <p>TermCount    --&gt;
+                            UInt64
+                        </p>
+                        <p>IndexInterval    --&gt;
+                            UInt32
+                        </p>
+                        <p>SkipInterval   --&gt;
+                            UInt32
+                        </p>
+                        <p>TermInfos    --&gt;
+                            &lt;TermInfo&gt;<sup>TermCount</sup>
+                        </p>
+                        <p>TermInfo    --&gt;
+                            &lt;Term, DocFreq, FreqDelta, ProxDelta, SkipDelta&gt;
+                        </p>
+                        <p>Term        --&gt;
+                            &lt;PrefixLength, Suffix, FieldNum&gt;
+                        </p>
+                        <p>Suffix        --&gt;
+                            String
+                        </p>
+                        <p>PrefixLength,
+                            DocFreq, FreqDelta, ProxDelta, SkipDelta<br/>        --&gt; VInt
+                        </p>
+                        <p>This
+                            file is sorted by Term.  Terms are ordered first lexicographically
+                            by the term's field name, and within that lexicographically by the
+                            term's text.
+                        </p>
+                        <p>TIVersion names the version of the format
+                            of this file and is -2 in Lucene 1.4.
+                        </p>
+                        <p>Term
+                            text prefixes are shared.  The PrefixLength is the number of initial
+                            characters from the previous term which must be pre-pended to a
+                            term's suffix in order to form the term's text.  Thus, if the
+                            previous term's text was "bone" and the term is "boy",
+                            the PrefixLength is two and the suffix is "y".
+                        </p>
+                        <p>FieldNumber
+                            determines the term's field, whose name is stored in the .fdt file.
+                        </p>
+                        <p>DocFreq
+                            is the count of documents which contain the term.
+                        </p>
+                        <p>FreqDelta
+                            determines the position of this term's TermFreqs within the .frq
+                            file.  In particular, it is the difference between the position of
+                            this term's data in that file and the position of the previous
+                            term's data (or zero, for the first term in the file).
+                        </p>
+                        <p>ProxDelta
+                            determines the position of this term's TermPositions within the .prx
+                            file.  In particular, it is the difference between the position of
+                            this term's data in that file and the position of the previous
+                            term's data (or zero, for the first term in the file.
+                        </p>
+                        <p>SkipDelta determines the position of this
+                            term's SkipData within the .frq file.  In
+                            particular, it is the number of bytes
+                            after TermFreqs that the SkipData starts.
+                            In other words, it is the length of the
+                            TermFreq data.
+                        </p>
+                    </li>
+                    <li>
+                        <p>
+                            The term info index, or .tii file.
+                        </p>
+
+                        <p>
+                            This contains every IndexInterval<sup>th</sup> entry from the .tis
+                            file, along with its location in the &quot;tis&quot; file.  This is
+                            designed to be read entirely into memory and used to provide random
+                            access to the &quot;tis&quot; file.
+                        </p>
+
+                        <p>
+                            The structure of this file is very similar to the
+                            .tis file, with the addition of one item per record, the IndexDelta.
+                        </p>
+
+                        <p>
+                            TermInfoIndex (.tii)--&gt;
+                            TIVersion, IndexTermCount, IndexInterval, SkipInterval, TermIndices 
+                        </p>
+                        <p>TIVersion --&gt;
+                        	UInt32
+                        </p>
+                        <p>IndexTermCount    --&gt;
+                            UInt64
+                        </p>
+                        <p>IndexInterval --&gt;
+                        	UInt32
+                        </p>
+                        <p>SkipInterval --&gt;
+                        	UInt32
+                        </p>
+                        <p>TermIndices    --&gt;
+                            &lt;TermInfo, IndexDelta&gt;<sup>IndexTermCount</sup>
+                        </p>
+                        <p>IndexDelta    --&gt;
+                            VLong
+                        </p>
+                        <p>IndexDelta
+                            determines the position of this term's TermInfo within the .tis file.  In
+                            particular, it is the difference between the position of this term's
+                            entry in that file and the position of the previous term's entry.
+                        </p>
+                        <p>SkipInterval is the fraction of TermDocs stored in skip tables. It is used to accelerate TermDocs.skipTo(int).
+                            Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while
+                            smaller values result in bigger indexes, less acceleration and more
+                            accelerable cases.</p>
+                    </li>
+                </ol>
+            </section>
+
+            <section id="Frequencies">
+                <title>Frequencies</title>
+                <p>
+                    The .frq file contains the lists of documents
+                    which contain each term, along with the frequency of the term in that
+                    document.
+                </p>
+                <p>FreqFile (.frq)    --&gt;
+                    &lt;TermFreqs, SkipData&gt;<sup>TermCount</sup>
+                </p>
+                <p>TermFreqs    --&gt;
+                    &lt;TermFreq&gt;<sup>DocFreq</sup>
+                </p>
+                <p>TermFreq        --&gt;
+                    DocDelta, Freq?
+                </p>
+                <p>SkipData        --&gt;
+                    &lt;SkipDatum&gt;<sup>DocFreq/SkipInterval</sup>
+                </p>
+                <p>SkipDatum    --&gt;
+                    DocSkip,FreqSkip,ProxSkip
+                </p>
+                <p>DocDelta,Freq,DocSkip,FreqSkip,ProxSkip    --&gt;
+                    VInt
+                </p>
+                <p>TermFreqs
+                    are ordered by term (the term is implicit, from the .tis file).
+                </p>
+                <p>TermFreq
+                    entries are ordered by increasing document number.
+                </p>
+                <p>DocDelta
+                    determines both the document number and the frequency.  In
+                    particular, DocDelta/2 is the difference between this document number
+                    and the previous document number (or zero when this is the first
+                    document in a TermFreqs).  When DocDelta is odd, the frequency is
+                    one.  When DocDelta is even, the frequency is read as another VInt.
+                </p>
+                <p>For
+                    example, the TermFreqs for a term which occurs once in document seven
+                    and three times in document eleven would be the following sequence of
+                    VInts:
+                </p>
+                <p>    15,
+                    8, 3
+                </p>
+                <p>DocSkip records the document number before every
+                    SkipInterval<sup>th</sup> document in TermFreqs.
+                    Document numbers are represented as differences
+                    from the previous value in the sequence.  FreqSkip
+                    and ProxSkip record the position of every
+                    SkipInterval<sup>th</sup> entry in FreqFile and
+                    ProxFile, respectively.  File positions are
+                    relative to the start of TermFreqs and Positions,
+                    to the previous SkipDatum in the sequence.
+                </p>
+                <p>For example, if DocFreq=35 and SkipInterval=16,
+                    then there are two SkipData entries, containing
+                    the 15<sup>th</sup> and 31<sup>st</sup> document
+                    numbers in TermFreqs.  The first FreqSkip names
+                    the number of bytes after the beginning of
+                    TermFreqs that the 16<sup>th</sup> SkipDatum
+                    starts, and the second the number of bytes after
+                    that that the 32<sup>nd</sup> starts.  The first
+                    ProxSkip names the number of bytes after the
+                    beginning of Positions that the 16<sup>th</sup>
+                    SkipDatum starts, and the second the number of
+                    bytes after that that the 32<sup>nd</sup> starts.
+                </p>
+
+            </section>
+            <section id="Positions">
+                <title>Positions</title>
+                <p>
+                    The .prx file contains the lists of positions that
+                    each term occurs at within documents.
+                </p>
+                <p>ProxFile (.prx)    --&gt;
+                    &lt;TermPositions&gt;<sup>TermCount</sup>
+                </p>
+                <p>TermPositions    --&gt;
+                    &lt;Positions&gt;<sup>DocFreq</sup>
+                </p>
+                <p>Positions        --&gt;
+                    &lt;PositionDelta&gt;<sup>Freq</sup>
+                </p>
+                <p>PositionDelta    --&gt;
+                    VInt
+                </p>
+                <p>TermPositions
+                    are ordered by term (the term is implicit, from the .tis file).
+                </p>
+                <p>Positions
+                    entries are ordered by increasing document number (the document
+                    number is implicit from the .frq file).
+                </p>
+                <p>PositionDelta
+                    is the difference between the position of the current occurrence in
+                    the document and the previous occurrence (or zero, if this is the
+                    first occurrence in this document).
+                </p>
+                <p>
+                    For example, the TermPositions for a
+                    term which occurs as the fourth term in one document, and as the
+                    fifth and ninth term in a subsequent document, would be the following
+                    sequence of VInts:
+                </p>
+                <p>    4,
+                    5, 4
+                </p>
+            </section>
+            <section id="Normalization Factors">
+                <title>Normalization Factors</title>
+                <p>There's a norm file for each indexed field with a byte for
+                   each document.  The .f[0-9]* file contains,
+                    for each document, a byte that encodes a value that is multiplied
+                    into the score for hits on that field:
+                </p>
+                <p>Norms
+                    (.f[0-9]*)    --&gt; &lt;Byte&gt;<sup>SegSize</sup>
+                </p>
+                <p>Each
+                    byte encodes a floating point value.  Bits 0-2 contain the 3-bit
+                    mantissa, and bits 3-8 contain the 5-bit exponent.
+                </p>
+                <p>These
+                    are converted to an IEEE single float value as follows:
+                </p>
+                <ol>
+                    <li><p>If
+                            the byte is zero, use a zero float.
+                        </p>
+                    </li>
+                    <li><p>Otherwise,
+                            set the sign bit of the float to zero;
+                        </p>
+                    </li>
+                    <li><p>add
+                            48 to the exponent and use this as the float's exponent;
+                        </p>
+                    </li>
+                    <li><p>map
+                            the mantissa to the high-order 3 bits of the float's mantissa; and
+
+                        </p>
+                    </li>
+                    <li><p>set
+                            the low-order 21 bits of the float's mantissa to zero.
+                        </p>
+                    </li>
+                </ol>
+
+            </section>
+            <section id="Term Vectors">
+                <title>Term Vectors</title>
+              Term Vector support is an optional on a field by field basis.  It consists of 4
+              files.
+              <ol>
+                <li>
+                  <p>The Document Index or .tvx file.</p>
+                  <p>This contains, for each document, a pointer to the document data in the Document 
+                    (.tvd) file.
+                  </p>
+                  <p>DocumentIndex (.tvx) --&gt; TVXVersion&lt;DocumentPosition&gt;<sup>NumDocs</sup></p>
+                  <p>TVXVersion --&gt; Int</p>
+                  <p>DocumentPosition   --&gt; UInt64</p>
+                  <p>This is used to find the position of the Document in the .tvd file.</p>
+                </li>
+                <li>
+                  <p>The Document or .tvd file.</p>
+                  <p>This contains, for each document, the number of fields, a list of the fields with
+                  term vector info and finally a list of pointers to the field information in the .tvf 
+                  (Term Vector Fields) file.</p>
+                  <p>
+                    Document (.tvd) --&gt; TVDVersion&lt;NumFields, FieldNums, FieldPositions,&gt;<sup>NumDocs</sup>
+                  </p>
+                  <p>TVDVersion --&gt; Int</p>
+                  <p>NumFields --&gt; VInt</p>
+                  <p>FieldNums --&gt; &lt;FieldNumDelta&gt;<sup>NumFields</sup></p>
+                  <p>FieldNumDelta --&gt; VInt</p>
+                  <p>FieldPositions --&gt; &lt;FieldPosition&gt;<sup>NumFields</sup></p>
+                  <p>FieldPosition --&gt; VLong</p>
+                  <p>The .tvd file is used to map out the fields that have term vectors stored and
+                  where the field information is in the .tvf file.</p>
+                </li>
+                <li>
+                  <p>The Field or .tvf file.</p>
+                  <p>This file contains, for each field that has a term vector stored, a list of
+                  the terms and their frequencies.</p>
+                  <p>Field (.tvf) --&gt; TVFVersion&lt;NumTerms, NumDistinct, TermFreqs&gt;<sup>NumFields</sup></p>
+                  <p>TVFVersion --&gt; Int</p>
+                  <p>NumTerms --&gt; VInt</p>
+                  <p>NumDistinct --&gt; VInt -- Future Use</p>
+                  <p>TermFreqs --&gt; &lt;TermText, TermFreq&gt;<sup>NumTerms</sup></p>
+                  <p>TermText --&gt; &lt;PrefixLength, Suffix&gt;</p>
+                  <p>PrefixLength --&gt; VInt</p>
+                  <p>Suffix --&gt; String</p>
+                  <p>TermFreq --&gt; VInt</p>
+                  <p>Term
+                      text prefixes are shared.  The PrefixLength is the number of initial
+                      characters from the previous term which must be pre-pended to a
+                      term's suffix in order to form the term's text.  Thus, if the
+                      previous term's text was "bone" and the term is "boy",
+                      the PrefixLength is two and the suffix is "y".
+                  </p>
+                </li>
+              </ol>
+            </section>
+
+            <section id="Deleted Documents">
+                <title>Deleted Documents</title>
+
+                <p>The .del file is
+                    optional, and only exists when a segment contains deletions:
+                </p>
+
+                <p>Deletions
+                    (.del)    --&gt; ByteCount,BitCount,Bits
+                </p>
+
+                <p>ByteSize,BitCount    --&gt;
+                    Uint32
+                </p>
+
+                <p>Bits        --&gt;
+                    &lt;Byte&gt;<sup>ByteCount</sup>
+                </p>
+
+                <p>ByteCount
+                    indicates the number of bytes in Bits.  It is typically
+                    (SegSize/8)+1.
+                </p>
+
+                <p>
+                    BitCount
+                    indicates the number of bits that are currently set in Bits.
+                </p>
+
+                <p>Bits
+                    contains one bit for each document indexed.  When the bit
+                    corresponding to a document number is set, that document is marked as
+                    deleted.  Bit ordering is from least to most significant.  Thus, if
+                    Bits contains two bytes, 0x00 and 0x02, then document 9 is marked as
+                    deleted.
+                </p>
+            </section>
+        </section>
+
+        <section id="Limitations">
+            <title>Limitations</title>
+            <p>There
+                are a few places where these file formats limit the maximum number of
+                terms and documents to a 32-bit quantity, or to approximately 4
+                billion.  This is not today a problem, but, in the long term,
+                probably will be.  These should therefore be replaced with either
+                UInt64 values, or better yet, with VInt values which have no limit.
+            </p>
+
+        </section>
+
+    </body>
+
+</document>

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/gettingstarted.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/gettingstarted.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/gettingstarted.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/gettingstarted.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,55 @@
+<?xml version="1.0"?>
+<document>
+	<header>
+        <title>
+	Apache Lucene - Getting Started Guide
+		</title>
+	</header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="Getting Started">
+    <title>Getting Started</title>
+<p>
+This document is intended as a "getting started" guide.  It has three audiences: first-time users
+looking to install Apache Lucene in their application or web server; developers looking to modify or base
+the applications they develop on Lucene; and developers looking to become involved in and contribute
+to the development of Lucene.  This document is written in tutorial and walk-through format.  The
+goal is to help you "get started".  It does not go into great depth on some of the conceptual or
+inner details of Lucene.
+</p>
+
+<p>
+Each section listed below builds on one another.  More advanced users
+may wish to skip sections.
+</p>
+
+<ul>
+	<li><a href="demo.html">About the command-line Lucene demo and its usage</a>.  This section
+	is intended for anyone who wants to use the command-line Lucene demo.</li> <p/>
+
+	<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
+	demo</a>.  This section walks through the implementation details (sources) of the
+	command-line Lucene demo.  This section is intended for developers.</li> <p/>
+
+	<li><a href="demo3.html">About installing and configuring the demo template web
+	application</a>.  While this walk-through assumes Tomcat as your container of choice,
+	there is no reason you can't (provided you have the requisite knowledge) adapt the
+	instructions to your container.  This section is intended for those responsible for the
+	development or deployment of Lucene-based web applications.</li> <p/>
+
+	<li><a href="demo4.html">About the sources used to construct the demo template web
+	application</a>.  Please note the template application is designed to highlight features of
+	Lucene and is <b>not</b> an example of best practices.  (One would hopefully use MVC
+	architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
+	would be WAY beyond the scope of this guide.)  This section is intended for developers and
+	those wishing to customize the demo template web application to their needs.  </li>
+
+</ul>
+</section>
+
+</body>
+</document>
+

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/asf-logo.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/asf-logo.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/asf-logo.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/favicon.ico
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/favicon.ico?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/favicon.ico
------------------------------------------------------------------------------
    svn:executable = *

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/favicon.ico
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_architecture.jpg
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_architecture.jpg?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_architecture.jpg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_crawling-process.jpg
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_crawling-process.jpg?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_crawling-process.jpg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lia_3d.jpg
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lia_3d.jpg?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lia_3d.jpg
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_100.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_100.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_100.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_150.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_150.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_150.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_200.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_200.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_200.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_250.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_250.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_250.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_300.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_300.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_300.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_100.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_100.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_100.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_150.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_150.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_150.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_200.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_200.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_200.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_250.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_250.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_250.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_300.gif
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_300.gif?view=auto&rev=479465
==============================================================================
Binary file - no diff available.

Propchange: lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_300.gif
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream



Mime
View raw message