forrest-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Manimaran <mknapa...@gmail.com>
Subject Forrest-Lucene raw files search
Date Thu, 24 Nov 2005 16:58:24 GMT
Hi,

I followed the following approach to make the raw files searchable using
Lucene.

Forrest uses site.xml to pass the documents to the Lucene index transformer.
site.xml will not have the list of all the raw files as entries. In my case
I wanted javadocs for a component library to be placed as raw HTML files and
be searchable. Hence updating site.xml every time the raw HTML files change
is out of the question. Hence a new file site-lucene.xml that contains both
site.xml and entries corresponding to all the raw HTML files was created.
Steps are as follows:

1. Write a batch file (UpdateLuceneSearchList.bat) that gets the recursive
list of all the HTML files and writes it to a file jupd.txt. Place it in the
root of the folder containing the raw HTML files.
Contents of UpdateLuceneSearchList.bat >>
dir *.htm* /n /b /s >jupd.txt

2. Write a java program that takes site.xml and jupd.txt and produces a new
xml file site-lucene.xml. Source attached.

3. Update search.xmap to enable our new site-lucene.xml to be used to obtain
the input

      <map:match pattern="site.lucene">
        <map:generate src="cocoon://abs-linkmap"/>

      <map:match pattern="site.lucene">
        <map:generate src="cocoon://abs-linkmap-lucene"/>

4. Add an entry for abs-linkmap-lucene to the pipeline in linkmap.xmap

      <map:match pattern="abs-linkmap-lucene">
        <map:generate src="{project:content.xdocs}site-lucene.xml" />
        <map:transform type="xinclude"/>
        <map:transform src="{forrest:stylesheets}/absolutize-linkmap.xsl" />
        <map:serialize type="xml" />
      </map:match>

5. Comment the following lines in site2book.xsl (as we generate the tags in
site-lucene.xml without labels)
<!--
      <xsl:when test="not(@label)">
      </xsl:when>
-->

6. Create a batch file that calls UpdateLuceneSearchList.bat and executes
the java program to update the index.
C:\neio\src\documentation\content\xdocs\globaljavadocs\jupd
java UpdateSite
C:\neio\src\documentation\content\xdocs\globaljavadocs\jupd.txt
C:\neio\src\documentation\content\xdocs\
C:\neio\src\documentation\content\xdocs\site.xml
C:\neio\src\documentation\content\xdocs\site-lucene.xml

This batch file can be scheduled to call every time there are updates to the
raw files to keep the index updated. If this is of any help and the search
related info on Forrest documentation could be updated, will be glad to do
so.

Thanks and regards,
Karthik.

Mime
View raw message