Subject [DAISY] Updated: LuceneIndexTransformer
Date Fri, 13 Jul 2007 22:26:15 GMT
A document has been updated:

Document ID: 1104
Branch: main
Language: default
Name: LuceneIndexTransformer (unchanged)
Document Type: Sitemap Component (unchanged)
Updated on: 7/13/07 10:26:10 PM
Updated by: Grzegorz Kossakowski

A new version has been created, state: publish


Long description
This part has been updated.
Mime type: text/xml (unchanged)
File name:  (unchanged)
Size: 9579 bytes (previous version: 10767 bytes)
Content diff:
--- <h4 id="head-0b39056584778d584af2f2cdd81c6998caa13ba5">LuceneIndexTransformer is
--- a component that creates or updates Lucene indexes.</h4>
+++ <p class="note">LuceneIndexTransformer is a component that creates or updates
+++ Lucene indexes.<br/>
+++ This component only writes the index: to search the index, use the
+++ <a href="daisy:1085">SearchGenerator</a> component.</p>
--- <p>This component only writes the index: to search the index, use the
--- SearchGenerator component.</p>
+++ <h1>Why use it?</h1>
--- <h3 id="head-9b35088110dfcf121e63a9a2b67ec652d667a784">Why use it?</h3>
    <p>Instead of using LuceneIndexTransformer, you could generate an index by
    crawling your website. However, the LuceneIndexTransformer is <em>much,
    much</em> faster than crawling.</p>
(19 equal lines skipped)
--- <h3 id="head-953c351734de75a525b9777e976c0812a5618736">Declaring the
--- LuceneIndexTransformer</h3>
+++ <h1>Declaring the LuceneIndexTransformer</h1>
    <p>The transformer must be declared in the <tt>&lt;transformers&gt;</tt>
    section of your sitemap:</p>
(13 equal lines skipped)
--- <h3 id="head-cea5eb78d3cf27bf4fdf96d1049365b4fa984307">Input document for the
--- LuceneIndexTransformer</h3>
+++ <h1>Input document for the LuceneIndexTransformer</h1>
    <p>This is a sample of the kind of document that the transformer expects. NB In
    this example, I've chosen a couple of simple XHTML documents as the content to
(41 equal lines skipped)
--- <h3 id="head-97d27647f366081a18adc8469538e908e6354ed4">What the lucene:index
--- document means</h3>
+++ <h2>What the lucene:index document means</h2>
--- <h4 id="head-9e412039c4f6090a2aaac081c56f522ac97b8985">The lucene:index element
--- </h4>
+++ <h3>The lucene:index element</h3>
    <p>The root element is <tt>lucene:index</tt>. The attributes of the
    <tt>lucene:index</tt> in the sample above are shown with their default values
    so the effect is as if they were not specified at all.</p>
--- <h4 id="head-40afef17a5a56ab2e729d18163f1bc960a8ce2cc">The merge-factor and
--- analyzer attributes</h4>
+++ <h3>The merge-factor and analyzer attributes</h3>
--- <p>See
--- <a href=""><img width="11" height="11"
--- the Lucene documentation</a> for explanations of what they mean.</p>
+++ <p>See <a href="">the Lucene
+++ documentation</a> for explanations of what they mean.</p>
--- <h4 id="head-84967edae247fc0739e57bc3af497f832b880582">The optimize-frequency
--- attribute (since version 2.2)</h4>
+++ <h3>The optimize-frequency attribute (since version 2.2)</h3>
    <p>Determines how often the lucene index will be optimized. When you have 1000's
    of documents, optimizing the index can become quite slow (eg. 7 seconds for 9000
(18 equal lines skipped)
    is index optimization and when should I use it? :</p>
--- <a href=""><img
width="11" height="11" src=""/>
+++ <a href=""></a>
--- <h4 id="head-51123b488fc39c0a36b69c0e24608052fd45a86d">The directory attribute
--- </h4>
+++ <h3>The directory attribute</h3>
    <p>This attribute controls where the index files are stored. The path is
    relative to the Cocoon <tt>work</tt> directory.</p>
--- <h4 id="head-9b03e7cb891515af05d6a3bde919087262b146aa">The create attribute</h4>
+++ <h3>The create attribute</h3>
    <p>This attribute controls whether the index is recreated.</p>
(15 equal lines skipped)
--- <h4 id="head-9585e2ebba0108dc71917a21a4d9ed1edca00732">The lucene:document
--- element</h4>
+++ <h3>The lucene:document element</h3>
    <p>Lucene will index the content of each <tt>lucene:document</tt>, which
    contain any xml content. The index is associated with the url specified by the
    <tt>url</tt> attribute. So this url will be returned as the results of a search.
--- <h4 id="head-5f2ae3b3aceb65a1fd0cb0942a0385fa7c4a4e2e">The lucene:text-attr
--- attribute</h4>
+++ <h3>The lucene:text-attr attribute</h3>
    <p>Normally Lucene will only index the content of these elements, not attribute
    values. To index the attributes of an element as well, give it an attribute
(6 equal lines skipped)
    <p>This would index the text "Blah".</p>
--- <h4 id="head-b85bebbca6ee9807e0a7165b5208677c1616aca7">The lucene:store
--- attribute</h4>
+++ <h3>The lucene:store attribute</h3>
    <p>Normally Lucene will only index the text of an element, not store it. To
    store the text of an element in Lucene's index, add a
    <tt>lucene:store="true"</tt> attribute to the element. It's a good idea to
    the title of a document in Lucene, so that your search results can show a
    document title as well as a URL.</p>
--- <h3 id="head-c55afa96d19d0ca7161da59bedf6409cbbfd78c2">The transformation</h3>
+++ <h1>The transformation</h1>
    <p>The transformer copies the source document to the output, except for the
    content of the <tt>lucene:document</tt> elements.</p>
(3 equal lines skipped)
    index that document. You can use XSLT to transform the results into a report on
    the indexing operation.</p>
--- <h4 id="head-c9a731f4df69c482e3c1d40fcc39e94b3fb16307">Sample output</h4>
+++ <h2>Sample output</h2>
    <pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
    &lt;lucene:index xmlns:lucene="" 
(10 equal lines skipped)
--- <h5 id="head-24e83f0c8063ca175d6e8a1a80e51e1ed9fbc20b">Note to users of Mac OS X
--- </h5>
+++ <h1>Note to users of Mac OS X</h1>
    <p>Java can not open more than 256 files at a time by default, so you may get an
    error like the following:</p>
(12 equal lines skipped)
    <p>Read more about this here:
--- <a href=""><img width="11"
height="11" src=""/>
+++ <a href=""></a>
+++ </p>
--- <h5 id="head-f9fcf2cc3f693a586067cd49d3cbe85a6297d60e">Note to users of Redhat
--- Linux</h5>
+++ <h1>Note to users of Redhat Linux</h1>
    <p>If you get the following error: (Empty StackException) while creating the
    index with the LuceneIndexTransformer try to alter your merge-factor to a lower
    value (default should be 10). Look at the
--- <a href=""><img
width="11" height="11" src=""/>
--- Lucene documentation</a> for more information.</p>
+++ <a href="">Lucene
+++ documentation</a> for more information.</p>

