lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From acoli...@apache.org
Subject cvs commit: jakarta-lucene/xdocs/stylesheets project.xml
Date Sat, 26 Jan 2002 15:01:32 GMT
acoliver    02/01/26 07:01:32

  Modified:    .        BUILD.txt build.properties build.xml
               src/test/org/apache/lucene IndexTest.java
               src/test/org/apache/lucene/index DocTest.java
               xdocs/stylesheets project.xml
  Added:       src/demo Search.html Search.jhtml
               src/demo/org/apache/lucene/demo DeleteFiles.java
                        FileDocument.java HTMLDocument.java IndexFiles.java
                        IndexHTML.java SearchFiles.java
               src/demo/org/apache/lucene/demo/html Entities.java
                        HTMLParser.jj ParserThread.java Test.java
               src/jsp  README.txt configuration.jsp footer.jsp header.jsp
                        index.jsp results.jsp
               src/jsp/WEB-INF web.xml
               xdocs    demo.xml demo2.xml demo3.xml demo4.xml
  Removed:     src/demo/org/apache/lucene DeleteFiles.java
                        FileDocument.java HTMLDocument.java IndexFiles.java
                        IndexHTML.java Search.html Search.jhtml
                        SearchFiles.java
               src/demo/org/apache/lucene/HTMLParser .cvsignore
                        Entities.java HTMLParser.jj ParserThread.java
                        Test.java
  Log:
  Reviewed by:	Doug Cutting / Lucene Community
  new demo build target
  added getting started guide
  modified tests
  moved demo to demo subpackage
  added war demo
  
  Revision  Changes    Path
  1.2       +3 -3      jakarta-lucene/BUILD.txt
  
  Index: BUILD.txt
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/BUILD.txt,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- BUILD.txt	4 Nov 2001 17:23:04 -0000	1.1
  +++ BUILD.txt	26 Jan 2002 15:01:31 -0000	1.2
  @@ -1,6 +1,6 @@
   Lucene Build Instructions
   
  -$Id: BUILD.txt,v 1.1 2001/11/04 17:23:04 cutting Exp $
  +$Id: BUILD.txt,v 1.2 2002/01/26 15:01:31 acoliver Exp $
   
   Basic steps:
     0) Install JDK 1.3, Ant 1.4, and the Ant 1.4 optional.jar.
  @@ -52,14 +52,14 @@
   Download either a zip or a tarred/gzipped version of the archive, and
   uncompress it into a directory of your choice.
   
  -Step 3) Connect to the top-level of your Lucene installation
  +Step 2) Connect to the top-level of your Lucene installation
   
   Lucene's top-level directory contains the build.properties and
   build.xml files.  You don't need to change any of the settings in
   these files, but you do need to run ant from this location so it knows
   where to find them.
   
  -Step 4) Run ant.
  +Step 3) Run ant.
   
   Assuming you have ant in your PATH and have set ANT_HOME to the
   location of your ant installation, typing "ant" at the shell prompt
  
  
  
  1.16      +3 -0      jakarta-lucene/build.properties
  
  Index: build.properties
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/build.properties,v
  retrieving revision 1.15
  retrieving revision 1.16
  diff -u -r1.15 -r1.16
  --- build.properties	25 Dec 2001 19:34:04 -0000	1.15
  +++ build.properties	26 Jan 2002 15:01:31 -0000	1.16
  @@ -14,6 +14,7 @@
   
   src.dir = ./src/java
   demo.src = ./src/demo
  +demo.jsp = ./src/jsp
   test.src = ./src/test
   docs.dir = ./docs
   lib.dir = ./lib
  @@ -37,6 +38,8 @@
   build.demo = ${build.dir}/demo
   build.demo.src = ${build.demo}/src
   build.demo.classes = ${build.demo}/classes
  +build.demo.name = ${name}-demos-${version}
  +build.war.name = luceneweb
   
   build.test = ${build.dir}/test
   build.test.src = ${build.test}/src
  
  
  
  1.18      +45 -3     jakarta-lucene/build.xml
  
  Index: build.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/build.xml,v
  retrieving revision 1.17
  retrieving revision 1.18
  diff -u -r1.17 -r1.18
  --- build.xml	19 Nov 2001 01:19:23 -0000	1.17
  +++ build.xml	26 Jan 2002 15:01:31 -0000	1.18
  @@ -121,6 +121,45 @@
       />
     </target>
   
  +  <target name="jardemo" depends="compile,demo" if="javacc.present">
  +    <jar 
  +      jarfile="${build.demo}/${build.demo.name}.jar" 
  +      basedir="${build.demo.classes}"
  +      excludes="**/*.java"
  +    />
  +  </target>
  +
  +  <target name="wardemo" depends="compile,demo,jar,jardemo" if="javacc.present">
  +    <mkdir dir="${build.demo}/${build.war.name}"/>
  +    <mkdir dir="${build.demo}/${build.war.name}/WEB-INF"/>
  +    <mkdir dir="${build.demo}/${build.war.name}/WEB-INF/lib"/>
  +    
  +    <copy todir="${build.demo}/${build.war.name}">
  +      <fileset dir="${demo.jsp}">
  +        <include name="**/*.jsp"/>
  +        <include name="**/*.xml"/>
  +      </fileset>
  +    </copy>
  +
  +    <copy todir="${build.demo}/${build.war.name}/WEB-INF/lib">
  +      <fileset dir="${build.dir}">
  +        <include name="*.jar"/>
  +      </fileset>
  +    </copy>
  +
  +    <copy todir="${build.demo}/${build.war.name}/WEB-INF/lib">
  +      <fileset dir="${build.demo}">
  +        <include name="*.jar"/>
  +      </fileset>
  +    </copy>
  +
  +   <jar
  +	jarfile="${build.demo}/${build.war.name}.war"
  +	basedir="${build.demo}/${build.war.name}"
  +	excludes="**/*.java"
  +   />
  +  </target>
  +
     <!-- ================================================================== -->
     <!-- J A R  S O U R C E                                                 -->
     <!-- ================================================================== -->
  @@ -163,9 +202,9 @@
       </copy>
       
       <javacc 
  -      target="${build.demo.src}/org/apache/lucene/HTMLParser/HTMLParser.jj" 
  +      target="${build.demo.src}/org/apache/lucene/demo/html/HTMLParser.jj" 
         javacchome="${javacc.zip.dir}"
  -      outputdirectory="${build.demo.src}/org/apache/lucene/HTMLParser"
  +      outputdirectory="${build.demo.src}/org/apache/lucene/demo/html"
       />
       
       <mkdir dir="${build.demo.classes}"/>
  @@ -321,7 +360,7 @@
     <!-- ================================================================== -->
     <!--                                                                    -->
     <!-- ================================================================== -->
  -  <target name="package" depends="jar, javadocs, demo">
  +  <target name="package" depends="jar, javadocs, demo, wardemo">
       <mkdir dir="${dist.dir}"/>
       <mkdir dir="${dist.dir}/docs"/>
       <mkdir dir="${dist.dir}/docs/api"/>
  @@ -339,6 +378,7 @@
         <fileset dir="${build.demo.classes}"/>
       </copy>
   
  +
       <copy todir="${dist.dir}/src">
         <fileset dir="src"/>
       </copy>
  @@ -353,6 +393,8 @@
         </fileset>
       </copy>
       <copy file="${build.dir}/${final.name}.jar" todir="${dist.dir}"/>
  +    <copy file="${build.demo}/${build.demo.name}.jar" todir="${dist.dir}"/>
  +    <copy file="${build.demo}/${build.war.name}.war" todir="${dist.dir}"/>
     </target>
   
     <!-- ================================================================== -->
  
  
  
  1.1                  jakarta-lucene/src/demo/Search.html
  
  Index: Search.html
  ===================================================================
  <HTML>
  <HEAD>
  <TITLE>Lucene Search Demo</TITLE>
  </HEAD>
  <BODY>
  
  <CENTER>
  <H1>
  Lucene Search Demo</H1>
  
  <form name=search action=http://localhost:8080/Search.jhtml method=get>
  <input name=query size=44>&nbsp;<input type=submit value=Search></form>
  
  </CENTER>
  
  </BODY>
  </HTML>
  
  
  
  1.1                  jakarta-lucene/src/demo/Search.jhtml
  
  Index: Search.jhtml
  ===================================================================
  <HTML><!-- -*-java-*- -->
  <!-- Lucene Search Demo via CompiledPageServlet -->
  <!-- Copyright (c) 1998,2000 Douglass R. Cutting. -->
  
  <java type=import>
    javax.servlet.*
    javax.servlet.http.*
    java.io.*
    org.apache.lucene.analysis.*
    org.apache.lucene.document.*
    org.apache.lucene.index.*
    org.apache.lucene.search.*
    org.apache.lucene.queryParser.*
    org.apache.lucene.demo.*
    org.apache.lucene.demo.html.Entities
  </java>
  
  <java>
    // get index from request
    String indexName = request.getParameter("index");
    if (indexName == null)			  // default to "index"
      indexName = "index";
    Searcher searcher =				  // make searcher
      new IndexSearcher(getReader(indexName));
  
    // get query from request
    String queryString = request.getParameter("query");
    if (queryString == null)			  
      throw new ServletException("no query specified");
      
    int start = 0;				  // first hit to display
    String startString = request.getParameter("start");
    if (startString != null)
      start = Integer.parseInt(startString);
  
    int hitsPerPage = 10;				  // number of hits to display
    String hitsString = request.getParameter("hitsPerPage");
    if (hitsString != null)
      hitsPerPage = Integer.parseInt(hitsString);
  
    boolean showSummaries = true;			  // show summaries?
    if ("false".equals(request.getParameter("showSummaries")))
      showSummaries = false;
  
    Query query = null;
    try {						  // parse query
      query = QueryParser.parse(queryString, "contents", analyzer);
    } catch (ParseException e) {			  // error parsing query
      </java>
      <HEAD><TITLE>Error Parsing Query</TITLE></HEAD><BODY>
      <p>While parsing `queryString`: `e.getMessage()`
      <java>
      return;
    }
  
    String servletPath = request.getRequestURI();	  // getServletPath should work
    int j = servletPath.indexOf('?');		  // here but doesn't, so we
    if (j != -1)					  // remove query by hand...
      servletPath = servletPath.substring(0, j);
  
  </java>
  
  <head><title>Lucene Search Results</title></head><body>
  
  <center>
   <form name=search action=`servletPath` method=get>
   <input name=query size=44 value='`queryString`'>
   <input type=hidden name=index value="`indexName`">
   <input type=hidden name=hitsPerPage value=`hitsPerPage`>
   <input type=hidden name=showSummaries value=`showSummaries`>
   <input type=submit value=Search>
   </form>
  </center>
  <java>
    Hits hits = searcher.search(query);		  // perform query
    int end = Math.min(hits.length(), start + hitsPerPage);
  </java>
  
  <p>Hits <b><java type=print>start+1</java>-<java type=print>end</java></b>
  (out of <java type=print>hits.length()</java> total matching documents):
  
  <ul>
  <java>
    for (int i = start; i < end; i++) {		  // display the hits
      Document doc = hits.doc(i);
      String title = doc.get("title");
      if (title.equals(""))			  // use url for docs w/o title
        title = doc.get("url");
      </java>
      <p><b><java type=print>(int)(hits.score(i) * 100.0f)</java>%
      <a href="`doc.get("url")`">
      <java type=print>Entities.encode(title)</java>
      </b></a>
      <java>
      if (showSummaries) {			  // maybe show summary
      </java>
      <ul><i>Summary</i>:
        <java type=print>Entities.encode(doc.get("summary"))</java>
      </ul>
      <java>
      }
    }
  </java>
  </ul>
  
  <java>
    if (end < hits.length()) {			  // insert next page button
  </java>
      <center>
      <form name=search action=`servletPath` method=get>
      <input type=hidden name=query value='`queryString`'>
      <input type=hidden name=start value=`end`>
      <input type=hidden name=index value="`indexName`">
      <input type=hidden name=hitsPerPage value=`hitsPerPage`>
      <input type=hidden name=showSummaries value=`showSummaries`>
      <input type=submit value=Next>
      </form>
      </center>
  <java>
      }
  </java>
  
  </body>
  
  <java type=class>
  
    Analyzer analyzer = new StopAnalyzer();	  // used to tokenize queries
  
    /** Keep a cache of open IndexReader's, so that an index does not have to
        opened for each query.  The cache re-opens an index when it has changed
        so that additions and deletions are visible ASAP. */
  
    static Hashtable indexCache = new Hashtable();  // name->CachedIndex
  
    class CachedIndex {				  // an entry in the cache
      IndexReader reader;				  // an open reader
      long modified;				  // reader's modified date
      
      CachedIndex(String name) throws IOException {
        modified = IndexReader.lastModified(name);  // get modified date
        reader = IndexReader.open(name);		  // open reader
      }
    }
  
    IndexReader getReader(String name) throws ServletException {
      CachedIndex index =				  // look in cache
        (CachedIndex)indexCache.get(name);
      
      try {
        if (index != null &&			  // check up-to-date
  	  (index.modified == IndexReader.lastModified(name)))
  	return index.reader;			  // cache hit
        else {
  	index = new CachedIndex(name);		  // cache miss
        }
      } catch (IOException e) {
        StringWriter writer = new StringWriter();
        PrintWriter pw = new PrintWriter(writer);
        throw new ServletException("Could not open index " + name + ": " +
  				 e.getClass().getName() + "--" +
  				 e.getMessage());
      }
  
      indexCache.put(name, index);		  // add to cache
      return index.reader;
    }
  </java>
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/DeleteFiles.java
  
  Index: DeleteFiles.java
  ===================================================================
  package org.apache.lucene.demo;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.io.IOException;
  
  import org.apache.lucene.store.Directory;
  import org.apache.lucene.store.FSDirectory;
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.index.Term;
  
  class DeleteFiles {
    public static void main(String[] args) {
      try {
        Directory directory = FSDirectory.getDirectory("demo index", false);
        IndexReader reader = IndexReader.open(directory);
  
  //       Term term = new Term("path", "pizza");
  //       int deleted = reader.delete(term);
  
  //       System.out.println("deleted " + deleted +
  // 			 " documents containing " + term);
  
        for (int i = 0; i < reader.maxDoc(); i++)
  	reader.delete(i);
  
        reader.close();
        directory.close();
  
      } catch (Exception e) {
        System.out.println(" caught a " + e.getClass() +
  			 "\n with message: " + e.getMessage());
      }
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/FileDocument.java
  
  Index: FileDocument.java
  ===================================================================
  package org.apache.lucene.demo;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.io.File;
  import java.io.Reader;
  import java.io.FileInputStream;
  import java.io.BufferedReader;
  import java.io.InputStreamReader;
  
  import org.apache.lucene.document.Document;
  import org.apache.lucene.document.Field;
  import org.apache.lucene.document.DateField;
  
  /** A utility for making Lucene Documents from a File. */
  
  public class FileDocument {
    /** Makes a document for a File.
      <p>
      The document has three fields:
      <ul>
      <li><code>path</code>--containing the pathname of the file, as a stored,
      tokenized field;
      <li><code>modified</code>--containing the last modified date of the file as
      a keyword field as encoded by <a
      href="lucene.document.DateField.html">DateField</a>; and
      <li><code>contents</code>--containing the full contents of the file, as a
      Reader field;
      */
    public static Document Document(File f)
         throws java.io.FileNotFoundException {
  	 
      // make a new, empty document
      Document doc = new Document();
  
      // Add the path of the file as a field named "path".  Use a Text field, so
      // that the index stores the path, and so that the path is searchable
      doc.add(Field.Text("path", f.getPath()));
  
      // Add the last modified date of the file a field named "modified".  Use a
      // Keyword field, so that it's searchable, but so that no attempt is made
      // to tokenize the field into words.
      doc.add(Field.Keyword("modified",
  			  DateField.timeToString(f.lastModified())));
  
      // Add the contents of the file a field named "contents".  Use a Text
      // field, specifying a Reader, so that the text of the file is tokenized.
      // ?? why doesn't FileReader work here ??
      FileInputStream is = new FileInputStream(f);
      Reader reader = new BufferedReader(new InputStreamReader(is));
      doc.add(Field.Text("contents", reader));
  
      // return the document
      return doc;
    }
  
    private FileDocument() {}
  }
      
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/HTMLDocument.java
  
  Index: HTMLDocument.java
  ===================================================================
  package org.apache.lucene.demo;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.io.*;
  import org.apache.lucene.document.*;
  import org.apache.lucene.demo.html.HTMLParser;
  
  /** A utility for making Lucene Documents for HTML documents. */
  
  public class HTMLDocument {
    static char dirSep = System.getProperty("file.separator").charAt(0);
  
    public static String uid(File f) {
      // Append path and date into a string in such a way that lexicographic
      // sorting gives the same results as a walk of the file hierarchy.  Thus
      // null (\u0000) is used both to separate directory components and to
      // separate the path from the date.
      return f.getPath().replace(dirSep, '\u0000') +
        "\u0000" +
        DateField.timeToString(f.lastModified());
    }
  
    public static String uid2url(String uid) {
      String url = uid.replace('\u0000', '/');	  // replace nulls with slashes
      return url.substring(0, url.lastIndexOf('/')); // remove date from end
    }
  
    public static Document Document(File f)
         throws IOException, InterruptedException  {
      // make a new, empty document
      Document doc = new Document();
  
      // Add the url as a field named "url".  Use an UnIndexed field, so
      // that the url is just stored with the document, but is not searchable.
      doc.add(Field.UnIndexed("url", f.getPath().replace(dirSep, '/')));
  
      // Add the last modified date of the file a field named "modified".  Use a
      // Keyword field, so that it's searchable, but so that no attempt is made
      // to tokenize the field into words.
      doc.add(Field.Keyword("modified",
  			  DateField.timeToString(f.lastModified())));
  
      // Add the uid as a field, so that index can be incrementally maintained.
      // This field is not stored with document, it is indexed, but it is not
      // tokenized prior to indexing.
      doc.add(new Field("uid", uid(f), false, true, false));
  
      HTMLParser parser = new HTMLParser(f);
  
      // Add the tag-stripped contents as a Reader-valued Text field so it will
      // get tokenized and indexed.
      doc.add(Field.Text("contents", parser.getReader()));
  
      // Add the summary as an UnIndexed field, so that it is stored and returned
      // with hit documents for display.
      doc.add(Field.UnIndexed("summary", parser.getSummary()));
  
      // Add the title as a separate Text field, so that it can be searched
      // separately.
      doc.add(Field.Text("title", parser.getTitle()));
  
      // return the document
      return doc;
    }
  
    private HTMLDocument() {}
  }
      
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/IndexFiles.java
  
  Index: IndexFiles.java
  ===================================================================
  package org.apache.lucene.demo;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.index.IndexWriter;
  
  import java.io.File;
  import java.util.Date;
  
  class IndexFiles {
    public static void main(String[] args) {
      try {
        Date start = new Date();
  
        IndexWriter writer = new IndexWriter("index", new StandardAnalyzer(), true);
        indexDocs(writer, new File(args[0]));
  
        writer.optimize();
        writer.close();
  
        Date end = new Date();
  
        System.out.print(end.getTime() - start.getTime());
        System.out.println(" total milliseconds");
  
      } catch (Exception e) {
        System.out.println(" caught a " + e.getClass() +
  			 "\n with message: " + e.getMessage());
      }
    }
  
    public static void indexDocs(IndexWriter writer, File file)
         throws Exception {
      if (file.isDirectory()) {
        String[] files = file.list();
        for (int i = 0; i < files.length; i++)
  	indexDocs(writer, new File(file, files[i]));
      } else {
        System.out.println("adding " + file);
        writer.addDocument(FileDocument.Document(file));
      }
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/IndexHTML.java
  
  Index: IndexHTML.java
  ===================================================================
  package org.apache.lucene.demo;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.index.*;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.util.Arrays;
  import org.apache.lucene.demo.html.HTMLParser;
  
  import java.io.File;
  import java.util.Date;
  
  class IndexHTML {
    private static boolean deleting = false;	  // true during deletion pass
    private static IndexReader reader;		  // existing index
    private static IndexWriter writer;		  // new index being built
    private static TermEnum uidIter;		  // document id iterator
  
    public static void main(String[] argv) {
      try {
        String index = "index";
        boolean create = false;
        File root = null;
        
        String usage = "IndexHTML [-create] [-index <index>] <root_directory>";
  
        if (argv.length == 0) {
  	System.err.println("Usage: " + usage);
  	return;
        }
  
        for (int i = 0; i < argv.length; i++) {
  	if (argv[i].equals("-index")) {		  // parse -index option
  	  index = argv[++i];
  	} else if (argv[i].equals("-create")) {	  // parse -create option
  	  create = true;
  	} else if (i != argv.length-1) {
  	  System.err.println("Usage: " + usage);
  	  return;
  	} else
  	  root = new File(argv[i]);
        }
  
        Date start = new Date();
  
        if (!create) {				  // delete stale docs
  	deleting = true;
  	indexDocs(root, index, create);
        }
  
        writer = new IndexWriter(index, new StandardAnalyzer(), create);
        writer.maxFieldLength = 1000000;
  
        indexDocs(root, index, create);		  // add new docs
  
        System.out.println("Optimizing index...");
        writer.optimize();
        writer.close();
  
        Date end = new Date();
  
        System.out.print(end.getTime() - start.getTime());
        System.out.println(" total milliseconds");
  
      } catch (Exception e) {
        System.out.println(" caught a " + e.getClass() +
  			 "\n with message: " + e.getMessage());
      }
    }
  
    /* Walk directory hierarchy in uid order, while keeping uid iterator from
    /* existing index in sync.  Mismatches indicate one of: (a) old documents to
    /* be deleted; (b) unchanged documents, to be left alone; or (c) new
    /* documents, to be indexed.
     */
  
    private static void indexDocs(File file, String index, boolean create)
         throws Exception {
      if (!create) {				  // incrementally update
        
        reader = IndexReader.open(index);		  // open existing index
        uidIter = reader.terms(new Term("uid", "")); // init uid iterator
      
        indexDocs(file);
  
        if (deleting) {				  // delete rest of stale docs
  	while (uidIter.term() != null && uidIter.term().field() == "uid") {
  	  System.out.println("deleting " +
  			     HTMLDocument.uid2url(uidIter.term().text()));
  	  reader.delete(uidIter.term());
  	  uidIter.next();
  	}
  	deleting = false;
        }
  
        uidIter.close();				  // close uid iterator
        reader.close();				  // close existing index
  
      } else					  // don't have exisiting
        indexDocs(file);
    }
  
    private static void indexDocs(File file) throws Exception {
      if (file.isDirectory()) {			  // if a directory
        String[] files = file.list();		  // list its files
        Arrays.sort(files);			  // sort the files
        for (int i = 0; i < files.length; i++)	  // recursively index them
  	indexDocs(new File(file, files[i]));
  
      } else if (file.getPath().endsWith(".html") || // index .html files
  	       file.getPath().endsWith(".htm") || // index .htm files
  	       file.getPath().endsWith(".txt")) { // index .txt files
        
        if (uidIter != null) {
  	String uid = HTMLDocument.uid(file);	  // construct uid for doc
  
  	while (uidIter.term() != null && uidIter.term().field() == "uid" &&
  	       uidIter.term().text().compareTo(uid) < 0) {
  	  if (deleting) {			  // delete stale docs
  	    System.out.println("deleting " +
  			       HTMLDocument.uid2url(uidIter.term().text()));
  	    reader.delete(uidIter.term());
  	  }
  	  uidIter.next();
  	}
  	if (uidIter.term() != null && uidIter.term().field() == "uid" &&
  	    uidIter.term().text().compareTo(uid) == 0) {
  	  uidIter.next();			  // keep matching docs
  	} else if (!deleting) {			  // add new docs
  	  Document doc = HTMLDocument.Document(file);
  	  System.out.println("adding " + doc.get("url"));
  	writer.addDocument(doc);
  	}
        } else {					  // creating a new index
  	Document doc = HTMLDocument.Document(file);
  	System.out.println("adding " + doc.get("url"));
  	writer.addDocument(doc);		  // add docs unconditionally
        }
      }
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/SearchFiles.java
  
  Index: SearchFiles.java
  ===================================================================
  package org.apache.lucene.demo;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.io.IOException;
  import java.io.BufferedReader;
  import java.io.InputStreamReader;
  
  import org.apache.lucene.analysis.Analyzer;
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.search.Searcher;
  import org.apache.lucene.search.IndexSearcher;
  import org.apache.lucene.search.Query;
  import org.apache.lucene.search.Hits;
  import org.apache.lucene.queryParser.QueryParser;
  
  class SearchFiles {
    public static void main(String[] args) {
      try {
        Searcher searcher = new IndexSearcher("index");
        Analyzer analyzer = new StandardAnalyzer();
  
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        while (true) {
  	System.out.print("Query: ");
  	String line = in.readLine();
  
  	if (line.length() == -1)
  	  break;
  
  	Query query = QueryParser.parse(line, "contents", analyzer);
  	System.out.println("Searching for: " + query.toString("contents"));
  
  	Hits hits = searcher.search(query);
  	System.out.println(hits.length() + " total matching documents");
  
  	final int HITS_PER_PAGE = 10;
  	for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
  	  int end = Math.min(hits.length(), start + HITS_PER_PAGE);
  	  for (int i = start; i < end; i++) {
  	    Document doc = hits.doc(i);
  	    String path = doc.get("path");
  	    if (path != null) {
                System.out.println(i + ". " + path);
  	    } else {
                String url = doc.get("url");
  	      if (url != null) {
  		System.out.println(i + ". " + url);
  		System.out.println("   - " + doc.get("title"));
  	      } else {
  		System.out.println(i + ". " + "No path nor URL for this document");
  	      }
  	    }
  	  }
  
  	  if (hits.length() > end) {
  	    System.out.print("more (y/n) ? ");
  	    line = in.readLine();
  	    if (line.length() == 0 || line.charAt(0) == 'n')
  	      break;
  	  }
  	}
        }
        searcher.close();
  
      } catch (Exception e) {
        System.out.println(" caught a " + e.getClass() +
  			 "\n with message: " + e.getMessage());
      }
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/html/Entities.java
  
  Index: Entities.java
  ===================================================================
  package org.apache.lucene.demo.html;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.util.*;
  
  public class Entities {
    static final Hashtable decoder = new Hashtable(300);
    static final String[]  encoder = new String[0x100];
  
    static final String decode(String entity) {
      if (entity.charAt(entity.length()-1) == ';')  // remove trailing semicolon
        entity = entity.substring(0, entity.length()-1);
      if (entity.charAt(1) == '#') {
        int start = 2;
        int radix = 10;
        if (entity.charAt(2) == 'X' || entity.charAt(2) == 'x') {
  	start++;
  	radix = 16;
        }
        Character c =
  	new Character((char)Integer.parseInt(entity.substring(start), radix));
        return c.toString();
      } else {
        String s = (String)decoder.get(entity);
        if (s != null)
  	return s;
        else return "";
      }
    }
  
    static final public String encode(String s) {
      int length = s.length();
      StringBuffer buffer = new StringBuffer(length * 2);
      for (int i = 0; i < length; i++) {
        char c = s.charAt(i);
        int j = (int)c;
        if (j < 0x100 && encoder[j] != null) {
  	buffer.append(encoder[j]);		  // have a named encoding
  	buffer.append(';');
        } else if (j < 0x80) {
  	buffer.append(c);			  // use ASCII value
        } else {
  	buffer.append("&#");			  // use numeric encoding
  	buffer.append((int)c);
  	buffer.append(';');
        }
      }
      return buffer.toString();
    }
  
    static final void add(String entity, int value) {
      decoder.put(entity, (new Character((char)value)).toString());
      if (value < 0x100)
        encoder[value] = entity;
    }
  
    static {
      add("&nbsp",   160);
      add("&iexcl",  161);
      add("&cent",   162);
      add("&pound",  163);
      add("&curren", 164);
      add("&yen",    165);
      add("&brvbar", 166);
      add("&sect",   167);
      add("&uml",    168);
      add("&copy",   169);
      add("&ordf",   170);
      add("&laquo",  171);
      add("&not",    172);
      add("&shy",    173);
      add("&reg",    174);
      add("&macr",   175);
      add("&deg",    176);
      add("&plusmn", 177);
      add("&sup2",   178);
      add("&sup3",   179);
      add("&acute",  180);
      add("&micro",  181);
      add("&para",   182);
      add("&middot", 183);
      add("&cedil",  184);
      add("&sup1",   185);
      add("&ordm",   186);
      add("&raquo",  187);
      add("&frac14", 188);
      add("&frac12", 189);
      add("&frac34", 190);
      add("&iquest", 191);
      add("&Agrave", 192);
      add("&Aacute", 193);
      add("&Acirc",  194);
      add("&Atilde", 195);
      add("&Auml",   196);
      add("&Aring",  197);
      add("&AElig",  198);
      add("&Ccedil", 199);
      add("&Egrave", 200);
      add("&Eacute", 201);
      add("&Ecirc",  202);
      add("&Euml",   203);
      add("&Igrave", 204);
      add("&Iacute", 205);
      add("&Icirc",  206);
      add("&Iuml",   207);
      add("&ETH",    208);
      add("&Ntilde", 209);
      add("&Ograve", 210);
      add("&Oacute", 211);
      add("&Ocirc",  212);
      add("&Otilde", 213);
      add("&Ouml",   214);
      add("&times",  215);
      add("&Oslash", 216);
      add("&Ugrave", 217);
      add("&Uacute", 218);
      add("&Ucirc",  219);
      add("&Uuml",   220);
      add("&Yacute", 221);
      add("&THORN",  222);
      add("&szlig",  223);
      add("&agrave", 224);
      add("&aacute", 225);
      add("&acirc",  226);
      add("&atilde", 227);
      add("&auml",   228);
      add("&aring",  229);
      add("&aelig",  230);
      add("&ccedil", 231);
      add("&egrave", 232);
      add("&eacute", 233);
      add("&ecirc",  234);
      add("&euml",   235);
      add("&igrave", 236);
      add("&iacute", 237);
      add("&icirc",  238);
      add("&iuml",   239);
      add("&eth",    240);
      add("&ntilde", 241);
      add("&ograve", 242);
      add("&oacute", 243);
      add("&ocirc",  244);
      add("&otilde", 245);
      add("&ouml",   246);
      add("&divide", 247);
      add("&oslash", 248);
      add("&ugrave", 249);
      add("&uacute", 250);
      add("&ucirc",  251);
      add("&uuml",   252);
      add("&yacute", 253);
      add("&thorn",  254);
      add("&yuml",   255);
      add("&fnof",   402);
      add("&Alpha",  913);
      add("&Beta",   914);
      add("&Gamma",  915);
      add("&Delta",  916);
      add("&Epsilon",917);
      add("&Zeta",   918);
      add("&Eta",    919);
      add("&Theta",  920);
      add("&Iota",   921);
      add("&Kappa",  922);
      add("&Lambda", 923);
      add("&Mu",     924);
      add("&Nu",     925);
      add("&Xi",     926);
      add("&Omicron",927);
      add("&Pi",     928);
      add("&Rho",    929);
      add("&Sigma",  931);
      add("&Tau",    932);
      add("&Upsilon",933);
      add("&Phi",    934);
      add("&Chi",    935);
      add("&Psi",    936);
      add("&Omega",  937);
      add("&alpha",  945);
      add("&beta",   946);
      add("&gamma",  947);
      add("&delta",  948);
      add("&epsilon",949);
      add("&zeta",   950);
      add("&eta",    951);
      add("&theta",  952);
      add("&iota",   953);
      add("&kappa",  954);
      add("&lambda", 955);
      add("&mu",     956);
      add("&nu",     957);
      add("&xi",     958);
      add("&omicron",959);
      add("&pi",     960);
      add("&rho",    961);
      add("&sigmaf", 962);
      add("&sigma",  963);
      add("&tau",    964);
      add("&upsilon",965);
      add("&phi",    966);
      add("&chi",    967);
      add("&psi",    968);
      add("&omega",  969);
      add("&thetasym",977);
      add("&upsih",  978);
      add("&piv",    982);
      add("&bull",   8226);
      add("&hellip", 8230);
      add("&prime",  8242);
      add("&Prime",  8243);
      add("&oline",  8254);
      add("&frasl",  8260);
      add("&weierp", 8472);
      add("&image",  8465);
      add("&real",   8476);
      add("&trade",  8482);
      add("&alefsym",8501);
      add("&larr",   8592);
      add("&uarr",   8593);
      add("&rarr",   8594);
      add("&darr",   8595);
      add("&harr",   8596);
      add("&crarr",  8629);
      add("&lArr",   8656);
      add("&uArr",   8657);
      add("&rArr",   8658);
      add("&dArr",   8659);
      add("&hArr",   8660);
      add("&forall", 8704);
      add("&part",   8706);
      add("&exist",  8707);
      add("&empty",  8709);
      add("&nabla",  8711);
      add("&isin",   8712);
      add("&notin",  8713);
      add("&ni",     8715);
      add("&prod",   8719);
      add("&sum",    8721);
      add("&minus",  8722);
      add("&lowast", 8727);
      add("&radic",  8730);
      add("&prop",   8733);
      add("&infin",  8734);
      add("&ang",    8736);
      add("&and",    8743);
      add("&or",     8744);
      add("&cap",    8745);
      add("&cup",    8746);
      add("&int",    8747);
      add("&there4", 8756);
      add("&sim",    8764);
      add("&cong",   8773);
      add("&asymp",  8776);
      add("&ne",     8800);
      add("&equiv",  8801);
      add("&le",     8804);
      add("&ge",     8805);
      add("&sub",    8834);
      add("&sup",    8835);
      add("&nsub",   8836);
      add("&sube",   8838);
      add("&supe",   8839);
      add("&oplus",  8853);
      add("&otimes", 8855);
      add("&perp",   8869);
      add("&sdot",   8901);
      add("&lceil",  8968);
      add("&rceil",  8969);
      add("&lfloor", 8970);
      add("&rfloor", 8971);
      add("&lang",   9001);
      add("&rang",   9002);
      add("&loz",    9674);
      add("&spades", 9824);
      add("&clubs",  9827);
      add("&hearts", 9829);
      add("&diams",  9830);
      add("&quot",   34);
      add("&amp",    38);
      add("&lt",     60);
      add("&gt",     62);
      add("&OElig",  338);
      add("&oelig",  339);
      add("&Scaron", 352);
      add("&scaron", 353);
      add("&Yuml",   376);
      add("&circ",   710);
      add("&tilde",  732);
      add("&ensp",   8194);
      add("&emsp",   8195);
      add("&thinsp", 8201);
      add("&zwnj",   8204);
      add("&zwj",    8205);
      add("&lrm",    8206);
      add("&rlm",    8207);
      add("&ndash",  8211);
      add("&mdash",  8212);
      add("&lsquo",  8216);
      add("&rsquo",  8217);
      add("&sbquo",  8218);
      add("&ldquo",  8220);
      add("&rdquo",  8221);
      add("&bdquo",  8222);
      add("&dagger", 8224);
      add("&Dagger", 8225);
      add("&permil", 8240);
      add("&lsaquo", 8249);
      add("&rsaquo", 8250);
      add("&euro",   8364);
  
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/html/HTMLParser.jj
  
  Index: HTMLParser.jj
  ===================================================================
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  // HTMLParser.jj
  
  options {
    STATIC = false;
    OPTIMIZE_TOKEN_MANAGER = true;
    //DEBUG_LOOKAHEAD = true;
    //DEBUG_TOKEN_MANAGER = true;
  }
  
  PARSER_BEGIN(HTMLParser)
  
  package org.apache.lucene.demo.html;
  
  import java.io.*;
  
  public class HTMLParser {
    public static int SUMMARY_LENGTH = 200;
    
    StringBuffer title = new StringBuffer(SUMMARY_LENGTH);
    StringBuffer summary = new StringBuffer(SUMMARY_LENGTH * 2);
    int length = 0;
    boolean titleComplete = false;
    boolean inTitle = false;
    boolean inScript = false;
    boolean afterTag = false;
    boolean afterSpace = false;
    String eol = System.getProperty("line.separator");
    PipedReader pipeIn = null;
    PipedWriter pipeOut;
  
    public HTMLParser(File file) throws FileNotFoundException {
      this(new FileInputStream(file));
    }
  
    public String getTitle() throws IOException, InterruptedException {
      if (pipeIn == null)
        getReader();				  // spawn parsing thread
      while (true) {
        synchronized(this) {
  	if (titleComplete || (length > SUMMARY_LENGTH))
  	  break;
  	wait(10);
        }
      }
      return title.toString().trim();
    }
  
    public String getSummary() throws IOException, InterruptedException {
      if (pipeIn == null)
        getReader();				  // spawn parsing thread
      while (true) {
        synchronized(this) {
  	if (summary.length() >= SUMMARY_LENGTH)
  	  break;
  	wait(10);
        }
      }
      if (summary.length() > SUMMARY_LENGTH)
        summary.setLength(SUMMARY_LENGTH);
  
      String sum = summary.toString().trim();
      String tit = getTitle();
      if (sum.startsWith(tit))
        return sum.substring(tit.length());
      else
        return sum;
    }
  
    public Reader getReader() throws IOException {
      if (pipeIn == null) {
        pipeIn = new PipedReader();
        pipeOut = new PipedWriter(pipeIn);
        
        Thread thread = new ParserThread(this);
        thread.start();				  // start parsing
      }
  
      return pipeIn;
    }
  
    void addToSummary(String text) {
      if (summary.length() < SUMMARY_LENGTH) {
        summary.append(text);
        if (summary.length() >= SUMMARY_LENGTH) {
  	synchronized(this) {
  	  notifyAll();
  	}
        }
      }
    }
  
    void addText(String text) throws IOException {
      if (inScript)
        return;
      if (inTitle)
        title.append(text);
      else {
        addToSummary(text);
        if (!titleComplete && !title.equals("")) {  // finished title
  	synchronized(this) {
  	  titleComplete = true;			  // tell waiting threads
  	  notifyAll();
  	}
        }
      }
  
      length += text.length();
      pipeOut.write(text);
  
      afterSpace = false;
    }
    
    void addSpace() throws IOException {
      if (inScript)
        return;
      if (!afterSpace) {
        if (inTitle)
  	title.append(" ");
        else
  	addToSummary(" ");
        
        String space = afterTag ? eol : " ";
        length += space.length();
        pipeOut.write(space);
        afterSpace = true;
      }
    }
  
  //    void handleException(Exception e) {
  //      System.out.println(e.toString());  // print the error message
  //      System.out.println("Skipping...");
  //      Token t;
  //      do {
  //        t = getNextToken();
  //      } while (t.kind != TagEnd);
  //    }
  }
  
  PARSER_END(HTMLParser)
  
  
  void HTMLDocument() throws IOException :
  {
    Token t;
  }
  {
  //  try {
      ( Tag()         { afterTag = true; }
      | t=Decl()      { afterTag = true; }
      | CommentTag()  { afterTag = true; }
      | t=<Word>      { addText(t.image); afterTag = false; }
      | t=<Entity>    { addText(Entities.decode(t.image)); afterTag = false; }
      | t=<Punct>     { addText(t.image); afterTag = false; }
      | <Space>       { addSpace(); afterTag = false; }
      )* <EOF>
  //  } catch (ParseException e) {
  //    handleException(e);
  //  }
  }
  
  void Tag() throws IOException :
  {
    Token t1, t2;
    boolean inImg = false;
  }
  {
    t1=<TagName> {
      inTitle = t1.image.equalsIgnoreCase("<title"); // keep track if in <TITLE>
      inImg = t1.image.equalsIgnoreCase("<img");	  // keep track if in <IMG>
      if (inScript) {				  // keep track if in <SCRIPT>
        inScript = !t1.image.equalsIgnoreCase("</script");
      } else {
        inScript = t1.image.equalsIgnoreCase("<script");
      }
    }
    (t1=<ArgName>
     (<ArgEquals>
      (t2=ArgValue()				  // save ALT text in IMG tag
       {
         if (inImg && t1.image.equalsIgnoreCase("alt") && t2 != null)
           addText("[" + t2.image + "]");
       }
      )?
     )?
    )*
    <TagEnd>
  }
  
  Token ArgValue() :
  {
    Token t = null;
  }
  {
    t=<ArgValue>                              { return t; }
  | LOOKAHEAD(2)
    <ArgQuote1> <CloseQuote1>                 { return t; }
  | <ArgQuote1> t=<Quote1Text> <CloseQuote1>  { return t; }
  | LOOKAHEAD(2)
    <ArgQuote2> <CloseQuote2>                 { return t; }
  | <ArgQuote2> t=<Quote2Text> <CloseQuote2>  { return t; }
  }
  
  
  Token Decl() :
  {
    Token t;
  }
  {
    t=<DeclName> ( <ArgName> | ArgValue() | <ArgEquals> )* <TagEnd>
    { return t; }
  }
  
  
  void CommentTag() :
  {}
  {
    (<Comment1> ( <CommentText1> )* <CommentEnd1>)
   |
    (<Comment2> ( <CommentText2> )* <CommentEnd2>)
  }
    
  
  TOKEN :
  {
    < TagName:  "<" ("/")? ["A"-"Z","a"-"z"] (<ArgName>)? > : WithinTag
  | < DeclName: "<"  "!"   ["A"-"Z","a"-"z"] (<ArgName>)? > : WithinTag
  
  | < Comment1:  "<!--" > : WithinComment1
  | < Comment2:  "<!" >   : WithinComment2
  
  | < Word:     ( <LET> | <LET> (["+","/"])+ | <NUM> ["\""] |
                  <LET> ["-","'"] <LET> | ("$")? <NUM> [",","."] <NUM> )+ >
  | < #LET:     ["A"-"Z","a"-"z","0"-"9"] >
  | < #NUM:     ["0"-"9"] >
  
  | < Entity:   ( "&" (["A"-"Z","a"-"z"])+ (";")? | "&" "#" (<NUM>)+ (";")? ) >
  
  | < Space:    (<SP>)+ >
  | < #SP:      [" ","\t","\r","\n"] >
  
  | < Punct:    ~[] > // Keep this last.  It is a catch-all.
  }
  
  
  <WithinTag> TOKEN:
  {
    < ArgName:   (~[" ","\t","\r","\n","=",">","'","\""])
                 (~[" ","\t","\r","\n","=",">"])* >
  | < ArgEquals: "=" >  : AfterEquals
  | < TagEnd:    ">" | "=>" >  : DEFAULT
  }
  
  <AfterEquals> TOKEN:
  {
    < ArgValue:  (~[" ","\t","\r","\n","=",">","'","\""])
  	       (~[" ","\t","\r","\n",">"])* > : WithinTag
  }
  
  <WithinTag, AfterEquals> TOKEN:
  {
    < ArgQuote1: "'"  > : WithinQuote1
  | < ArgQuote2: "\"" > : WithinQuote2
  }
  
  <WithinTag, AfterEquals> SKIP:
  {
    < <Space> >
  }
  
  <WithinQuote1> TOKEN:
  {
    < Quote1Text:  (~["'"])+ >
  | < CloseQuote1: <ArgQuote1> > : WithinTag
  }
  
  <WithinQuote2> TOKEN:
  {
    < Quote2Text:  (~["\""])+ >
  | < CloseQuote2: <ArgQuote2> > : WithinTag
  }
  
  
  <WithinComment1> TOKEN :
  {
    < CommentText1:  (~["-"])+ | "-" >
  | < CommentEnd1:   "-->" > : DEFAULT
  }
  
  <WithinComment2> TOKEN :
  {
    < CommentText2:  (~[">"])+ >
  | < CommentEnd2:   ">" > : DEFAULT
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/html/ParserThread.java
  
  Index: ParserThread.java
  ===================================================================
  package org.apache.lucene.demo.html;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.io.*;
  
  class ParserThread extends Thread {		  
    HTMLParser parser;
  
    ParserThread(HTMLParser p) {
      parser = p;
    }
  
    public void run() {				  // convert pipeOut to pipeIn
      try {
        try {					  // parse document to pipeOut
  	parser.HTMLDocument(); 
        } catch (ParseException e) {
  	System.out.println("Parse Aborted: " + e.getMessage());
        } catch (TokenMgrError e) {
  	System.out.println("Parse Aborted: " + e.getMessage());
        } finally {
  	parser.pipeOut.close();
  	synchronized (parser) {
  	  parser.summary.setLength(parser.SUMMARY_LENGTH);
  	  parser.titleComplete = true;
  	  parser.notifyAll();
  	}
        }
      } catch (IOException e) {
  	e.printStackTrace();
      }
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/demo/org/apache/lucene/demo/html/Test.java
  
  Index: Test.java
  ===================================================================
  package org.apache.lucene.demo.html;
  
  /* ====================================================================
   * The Apache Software License, Version 1.1
   *
   * Copyright (c) 2001 The Apache Software Foundation.  All rights
   * reserved.
   *
   * Redistribution and use in source and binary forms, with or without
   * modification, are permitted provided that the following conditions
   * are met:
   *
   * 1. Redistributions of source code must retain the above copyright
   *    notice, this list of conditions and the following disclaimer.
   *
   * 2. Redistributions in binary form must reproduce the above copyright
   *    notice, this list of conditions and the following disclaimer in
   *    the documentation and/or other materials provided with the
   *    distribution.
   *
   * 3. The end-user documentation included with the redistribution,
   *    if any, must include the following acknowledgment:
   *       "This product includes software developed by the
   *        Apache Software Foundation (http://www.apache.org/)."
   *    Alternately, this acknowledgment may appear in the software itself,
   *    if and wherever such third-party acknowledgments normally appear.
   *
   * 4. The names "Apache" and "Apache Software Foundation" and
   *    "Apache Lucene" must not be used to endorse or promote products
   *    derived from this software without prior written permission. For
   *    written permission, please contact apache@apache.org.
   *
   * 5. Products derived from this software may not be called "Apache",
   *    "Apache Lucene", nor may "Apache" appear in their name, without
   *    prior written permission of the Apache Software Foundation.
   *
   * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
   * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
   * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
   * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
   * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
   * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
   * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
   * SUCH DAMAGE.
   * ====================================================================
   *
   * This software consists of voluntary contributions made by many
   * individuals on behalf of the Apache Software Foundation.  For more
   * information on the Apache Software Foundation, please see
   * <http://www.apache.org/>.
   */
  
  import java.io.*;
  
  class Test {
    public static void main(String[] argv) throws Exception {
      if ("-dir".equals(argv[0])) {
        String[] files = new File(argv[1]).list();
        java.util.Arrays.sort(files);
        for (int i = 0; i < files.length; i++) {
  	System.err.println(files[i]);
  	File file = new File(argv[1], files[i]);
  	parse(file);
        }
      } else
        parse(new File(argv[0]));
    }
  
    public static void parse(File file) throws Exception {
      HTMLParser parser = new HTMLParser(file);
      System.out.println("Title: " + Entities.encode(parser.getTitle()));
      System.out.println("Summary: " + Entities.encode(parser.getSummary()));
      LineNumberReader reader = new LineNumberReader(parser.getReader());
      for (String l = reader.readLine(); l != null; l = reader.readLine())
        System.out.println(l);
    }
  }
  
  
  
  1.1                  jakarta-lucene/src/jsp/README.txt
  
  Index: README.txt
  ===================================================================
  To build the Jakarta Lucene web app demo just run 
  "ant wardemo" from the Jakarta Lucene Installation
  directory (follow the master instructions in 
  BUILD.txt).  If you have questions please post 
  them to the Jakarta Lucene mailing lists.  To 
  actually figure this out you really need to 
  read the Lucene "Getting Started" guide provided
  with the doc build ("ant docs").
  
  
  
  1.1                  jakarta-lucene/src/jsp/configuration.jsp
  
  Index: configuration.jsp
  ===================================================================
  <%
  /* Author: Andrew C. Oliver (acoliver2@users.sourceforge.net) */
  String appTitle = "Jakarta Lucene Example - Intranet Server Search Application";
  /* make sure you point the below string to the index you created with IndexHTML */
  String indexLocation = "/opt/lucene/index";
  String appfooter = "Jakarta Lucene Template WebApp 1.0";
  %>
  
  
  
  1.1                  jakarta-lucene/src/jsp/footer.jsp
  
  Index: footer.jsp
  ===================================================================
  <% /* Author Andrew C. Oliver (acoliver2@users.sourceforge.net) */ %>
  <p>
  	<center>
  	<%=appfooter%>
  	</center>
  </p>
  </body>
  </html>
  
  
  
  1.1                  jakarta-lucene/src/jsp/header.jsp
  
  Index: header.jsp
  ===================================================================
  <%@include file="configuration.jsp"%>
  <% /* Author: Andrew C. Oliver (acoliver2@users.sourceforge.net */ %>
  <html>
  <header>
  	<title><%=appTitle%></title>
  </header>
  <body>
  <center>
  	<p>
  	Welcome to the Lucene Template application. (This is the header)
  	</p>
  </center>
  
  
  
  1.1                  jakarta-lucene/src/jsp/index.jsp
  
  Index: index.jsp
  ===================================================================
  <%@include file="header.jsp"%>
  <% /* Author: Andrew C. Oliver (acoliver2@users.sourceforge.net) */ %>
  <center> 
  	<form name="search" action="results.jsp" method="get">
  		<p>
  			<input name="query" size="44"/>&nbsp;Search Criteria
  		</p>
  		<p>
  			<input name="maxresults" size="4" value="100"/>&nbsp;Results Per Page&nbsp;
  			<input type="submit" value="Search"/>
  		</p>
          </form>
  </center>
  <%@include file="footer.jsp"%>
  
  
  
  1.1                  jakarta-lucene/src/jsp/results.jsp
  
  Index: results.jsp
  ===================================================================
  <%@ page import = "  javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities" %>
  
  <%
  /*
          Author: Andrew C. Oliver, SuperLink Software, Inc. (acoliver2@users.sourceforge.net)
  
          This jsp page is deliberatly written in the horrble java directly embedded 
          in the page style for an easy and conceise demonstration of Lucene.
          Due note...if you write pages that look like this...sooner or later
          you'll have a maintenance nightmere.  If you use jsps...use taglibs
          and beans!  That being said, this should be acceptable for a small
          page demonstrating how one uses Lucene in a web app. 
  
          This is also deliberately overcommented. ;-)
  
  */
  %>
  <%@include file="header.jsp"%>
  <%
          boolean error = false;                  //used to control flow for error messages
          String indexName = indexLocation;       //local copy of the configuration variable
          IndexSearcher searcher = null;          //the searcher used to open/search the index
          Query query = null;                     //the Query created by the QueryParser
          Hits hits = null;                       //the search results
          int startindex = 0;                     //the first index displayed on this page
          int maxpage    = 50;                    //the maximum items displayed on this page
          String queryString = null;              //the query entered in the previous page
          String startVal    = null;              //string version of startindex
          String maxresults  = null;              //string version of maxpage
          int thispage = 0;                       //used for the for/next either maxpage or
                                                  //hits.length() - startindex - whichever is
                                                  //less
  
          try {
          searcher = new IndexSearcher(
                          IndexReader.open(indexName)     //create an indexSearcher for our page
                  );
          } catch (Exception e) {                         //any error that happens is probably due
                                                          //to a permission problem or non-existant
                                                          //or otherwise corrupt index
  %>
                  <p>ERROR opening the Index - contact sysadmin!</p>
                  <p>While parsing query: <%=e.getMessage()%></p>   
  <%                error = true;                                  //don't do anything up to the footer
          }
  %>
  <%
         if (error == false) {                                           //did we open the index?
                  queryString = request.getParameter("query");           //get the search criteria
                  startVal    = request.getParameter("startat");         //get the start index
                  maxresults  = request.getParameter("maxresults");      //get max results per page
                  try {
                          maxpage    = Integer.parseInt(maxresults);    //parse the max results first
                          startindex = Integer.parseInt(startVal);      //then the start index  
                  } catch (Exception e) { } //we don't care if something happens we'll just start at 0
                                            //or end at 50
  
                  
  
                  if (queryString == null)
                          throw new ServletException("no query "+       //if you don't have a query then
                                                     "specified");      //you probably played on the 
                                                                        //query string so you get the 
                                                                        //treatment
  
                  Analyzer analyzer = new StopAnalyzer();               //construct our usual analyzer
                  try {
                          query = QueryParser.parse(queryString, "contents", analyzer); //parse the 
                  } catch (ParseException e) {                          //query and construct the Query
                                                                        //object
                                                                        //if its just "operator error"
                                                                        //send them a nice error HTML
                                                                        
  %>
                          <p>Error While parsing query: <%=e.getMessage()%></p>
  <%
                          error = true;                                 //don't bother with the rest of
                                                                        //the page
                  }
          }
  %>
  <%
          if (error == false && searcher != null) {                     // if we've had no errors
                                                                        // searcher != null was to handle
                                                                        // a weird compilation bug 
                  thispage = maxpage;                                   // default last element to maxpage
                  hits = searcher.search(query);                        // run the query 
                  if (hits.length() == 0) {                             // if we got no results tell the user
  %>
                  <p> I'm sorry I couldn't find what you were looking for. </p>
  <%
                  error = true;                                        // don't bother with the rest of the
                                                                       // page
                  }
          }
  
          if (error == false && searcher != null) {                   
  %>
                  <table>
                  <tr>
                          <td>Document</td>
                          <td>Summary</td>
                  </tr>
  <%
                  if ((startindex + maxpage) > hits.length()) {
                          thispage = hits.length() - startindex;      // set the max index to maxpage or last
                  }                                                   // actual search result whichever is less
  
                  for (int i = startindex; i < (thispage + startindex); i++) {  // for each element
  %>
                  <tr>
  <%
                          Document doc = hits.doc(i);                  //get the next document 
                          String doctitle = doc.get("title");          //get its title
                          String url = doc.get("url");                 //get its url field
                          if (doctitle.equals(""))                     //use the url if it has no title
                                  doctitle = url;
                                                                       //then output!
  %>
                          <td><a href="<%=url%>"><%=doctitle%></a></td>
                          <td><%=doc.get("summary")%></td>
                  </tr>
  <%
                  }
  %>
  <%                if ( (startindex + maxpage) < hits.length()) {   //if there are more results...display 
                                                                     //the more link
  
                          String moreurl="results.jsp?query=" + queryString +  //construct the "more" link
                                         "&maxresults=" + maxpage + 
                                         "&startat=" + (startindex + maxpage);
  %>
                  <tr>
                          <td></td><td><a href="<%=moreurl%>">More Results>></a></td>
                  </tr>
  <%
                  }
  %>
                  </table>
  
  <%       }                                            //then include our footer.
  %>
  <%@include file="footer.jsp"%>        
  
  
  
  1.1                  jakarta-lucene/src/jsp/WEB-INF/web.xml
  
  Index: web.xml
  ===================================================================
  <?xml version="1.0" encoding="ISO-8859-1"?>
  
  <!DOCTYPE web-app
      PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
      "http://java.sun.com/dtd/web-app_2_3.dtd">
  
  <web-app>
  
  
  </web-app>
  
  
  
  1.3       +1 -1      jakarta-lucene/src/test/org/apache/lucene/IndexTest.java
  
  Index: IndexTest.java
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/src/test/org/apache/lucene/IndexTest.java,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- IndexTest.java	18 Sep 2001 17:35:57 -0000	1.2
  +++ IndexTest.java	26 Jan 2002 15:01:32 -0000	1.3
  @@ -58,7 +58,7 @@
   import org.apache.lucene.index.IndexWriter;
   import org.apache.lucene.index.TermPositions;
   import org.apache.lucene.document.Document;
  -import org.apache.lucene.FileDocument;
  +import org.apache.lucene.demo.FileDocument;
   
   import java.io.File;
   import java.util.Date;
  
  
  
  1.3       +1 -1      jakarta-lucene/src/test/org/apache/lucene/index/DocTest.java
  
  Index: DocTest.java
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/src/test/org/apache/lucene/index/DocTest.java,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- DocTest.java	18 Sep 2001 17:35:57 -0000	1.2
  +++ DocTest.java	26 Jan 2002 15:01:32 -0000	1.3
  @@ -59,7 +59,7 @@
   import org.apache.lucene.store.FSDirectory;
   import org.apache.lucene.store.Directory;
   import org.apache.lucene.document.Document;
  -import org.apache.lucene.FileDocument;
  +import org.apache.lucene.demo.FileDocument;
   
   import java.io.File;
   import java.util.Date;
  
  
  
  1.1                  jakarta-lucene/xdocs/demo.xml
  
  Index: demo.xml
  ===================================================================
  <?xml version="1.0"?>
  <document>
  <properties>
  <author email="acoliver@apache.org">Andrew C. Oliver</author>
  <title>Jakarta Lucene - Building and Installing the Basic Demo</title>
  </properties>
  <body>
  
  <section name="About this Document">
  <p>
  This document is intended as a "getting started" guide to using and running the
  Jakarta Lucene demos.  It walks you through some basic installation and configuration.
  </p>
  </section>
  
  
  <section name="About the Demos">
  <p>
  The Lucene Demo code is a set of command line example applications that demonstrate various 
  functionality of Lucene and how one should go about adding it to their 
  applications.
  </p>
  </section>
  
  <section name="Setting your classpath">
  <p>
  First, extract the latest Lucene distribution.  
  </p>
  <p>
  You should see the Jakarta Lucene jar file in the directory you created 
  when you extracted the archive.  It should be named something like
  <b>lucene-{version}.jar</b>.  
  </p>
  <p>
  You should also see a file called called <b>lucene-demos-{version}.jar</b>.  
  Put both of these files in your Java CLASSPATH.
  </p>
  </section>
  
  <section name="Indexing Files">
  <p>
  Once you've gotten this far you're probably itching to go.  Let's <b> build an index!</b>
  Assuming you've set your classpath correctly, just type 
  "java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce 
  a subdirectory called "index" which will contain an index of all of the Lucene 
  sourcecode. 
  </p>
  <p>
  <b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles".  You'll be prompted
  for a query.  Type in a swear word and press the enter key.  You'll see that the Lucene 
  developers are very well mannered and get no results. Now try entering the word "vector".
  That should return a whole bunch of documents.  The results will page at every tenth
  result and ask you whether you want more results.
  </p>
  </section>
  
  <section name="About the code...">
  <p>
  <a href="demo2.html">read on&gt;&gt;&gt;</a>
  </p>
  </section>
  
  </body>
  </document>
  
  
  
  
  1.1                  jakarta-lucene/xdocs/demo2.xml
  
  Index: demo2.xml
  ===================================================================
  <?xml version="1.0"?>
  <document>
  <properties>
  <author email="acoliver@apache.org">Andrew C. Oliver</author>
  <title>Jakarta Lucene - Basic Demo Sources Walkthrough</title>
  </properties>
  <body>
  
  <section name="About the Code">
  <p>
  In this section we walk through the sources behind the basic Lucene demo such as where to 
  find it, its parts and their function.  This section is intended for Java developers
  wishing to understand how to use Jakarta Lucene in their applications.
  </p>
  </section>
  
  
  <section name="Location of the source">
  <p>
  Relative to the directory created when you extracted Lucene or retreived it from CVS, you 
  should see a directory called "src" which in turn contains a directory called "demo".
  This is the root for all of the Lucene demos.  Under this directory is org/apache/lucene/demo,
  this is where all the Java sources live.  
  </p>
  <p>
  Within this directory you should see the IndexFiles class we executed earlier.  Bring that
  up in vi or your alternative text editor and lets take a look at it.
  </p>
  </section>
  
  <section name="IndexFiles">
  <p>
  As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
  Lets take a look at how it does this.  
  </p>
  <p>
  The first substantial thing the main function does is instantiate an instance
  of IndexWriter.  It passes a string called "index" and a new instance of a class called
  "StandardAnalyzer".  The "index" string is the name of the directory that all index information
  should be stored in.  Because we're not passing any path information, one must assume this
  will be created as a subdirectory of the current directory (if does not already exist). On
  some platforms this may actually result in it being created in other directories (such as 
  the user's home directory). 
  </p>
  <p>
  The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
  must instantiate it with a path that it can write the index into, if this path does not 
  exist it will create it, otherwise it will refresh the index living at that path.  You 
  must a also pass an instance of <b>org.apache.analysis.Analyzer</b>. 
  </p>
  <p>
  The <b>Analyzer</b>, in this case, the <b>Stop Analyzer</b> is little more than a standard Java
  Tokenizer, converting all strings to lowercase and filtering out useless words from the index.
  By useless words I mean common language words such as articles (a,an,the) and other words that
  would be useless for searching.  It should be noted that there are different rules for every 
  language, and you should use the proper analyzer for each.  Lucene currently provides Analyzers
  for English and German.
  </p>
  <p>
  Looking down further in the file, you should see the indexDocs() code.  This recursive function 
  simply crawls the directories and uses FileDocument to create Document objects.  The Document
  is simply a data object to represent the content in the file as well as its creation time and 
  location.  These instances are added to the indexWriter.  Take a look inside FileDocument.  Its
  not particularly complicated, it just adds fields to the Document.
  </p>
  <p>
  As you can see there isn't much to creating an index.  The devil is in the details.  You may also
  wish to examine the other samples in this directory, particularly the IndexHTML class.  It is 
  a bit more complex but builds upon this example.
  </p>
  </section>
  
  <section name="Searching Files">
  <p>
  The SearchFiles class is quite simple.  It primarily collaborates with an IndexSearcher, StandardAnalyzer
  (which is used in the IndexFiles class as well) and a QueryParser.  The query parser is constructed
  with an analyzer used to interperate your query in the same way the Index was interperated: finding 
  the end of words and removing useless words like 'a', 'an' and 'the'.  The Query object contains the 
  results from the QueryParser which is passed to the searcher.  The searcher results are returned in 
  a collection of Documents called "Hits" which is then iterated through and displayed to the user.
  </p>
  </section>
  
  <section name="The Web example...">
  <p>
  <a href="demo3.html">read on&gt;&gt;&gt;</a>
  </p>
  </section>
  
  </body>
  </document>
  
  
  
  
  1.1                  jakarta-lucene/xdocs/demo3.xml
  
  Index: demo3.xml
  ===================================================================
  <?xml version="1.0"?>
  
  <document>
  <properties>
  <author email="acoliver@apache.org">Andrew C. Oliver</author>
  <title>Jakarta Lucene - Building and Installing the Basic Demo</title>
  </properties>
  <body>
  
  <section name="About this Document">
  <p>
  This document is intended as a "getting started" guide to installing and running the
  Jakarta Lucene web application demo.  This guide assumes that you have read the
  information in the previous two examples or already know it anyhow.  We'll use 
  Tomcat 4.0.1 as our reference web container.  These demos should work with nearly
  any container, but it is up to you to adapt them appropriately.
  </p>
  </section>
  
  
  <section name="About the Demos">
  <p>
  The Lucene Web Application demo is a template web application intended for deployment
  on Tomcat or a similar web container.  It's NOT designed as a "best practices"
  implementation by ANY means.  Its more of a "hello world" type Lucene Web App.  
  The purpose of this application is to demonstrate Lucene.  With that being said, 
  it should be relatively simple to create a small searchable website in Tomcat or 
  a similar application server.
  </p>
  </section>
  
  <section name="Indexing Files">
  <p> 
  Once you've gotten this far you're probably itching to go.  
  Let's start by creating the  index you'll need for the web examples.  
  Since you've already set your classpath in the previous examples, 
  all you need to do is type 
  <b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
  You'll need to do this from your {tomcat}/webapps/luceneweb directory.  {index-dir}
  should be a directory that Tomcat has permission to read and write, but is
  outside of a web accessible context.  By default the webapp is configured
  to look in <b>/opt/lucene/index</b> for this index.  
  </p>
  </section>
  
  <section name="Deploying the Demos">
  <p>Located in your distribution directory you should see
  a war file called luceneweb.war.  Copy this to your 
  {tomcat-home}/webapps directory.  You may need to restart 
  Tomcat.  </p>
  </section>
  
  <section name="Configuration">
  <p> 
  From your Tomcat directory look in the webapps/luceneweb subdirectory.  If its not 
  present, try browsing to "http://localhost:8080/luceneweb" then look again.  
  Edit a file called configuration.jsp.  Ensure that the indexLocation is equal to the 
  location you used for your index.  You may also customize the appTitle and appFooter 
  strings as you see fit.  Once you have finsihed altering the configuration you should 
  restart Tomcat.  You may also wish to update the war file by typing 
  <b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.  
  (The u option is not available in all versions of jar.  In this case recreate the war file).
  </p>
  </section>
  
  <section name="Running the Demos">
  <p>Now you're ready to roll.  In your browser set the url to "http://localhost:8080/luceneweb"
  enter "test" and the number of items per page and press search.</p>
  <p>You should now be looking either at a number of results (provided you didn't erase the 
  Tomcat examples) or nothing.  Try other search terms.  Depending on the number of items 
  per page you set and results returned, there may be a link at the bottom that says "more results>>",
  clicking it goes to subsequent pages.  If you get an error regarding opening the index, then you
  probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the 
  index (or you skipped the step of creating it).</p>
  </section>
  
  <section name="About the code...">
  <p>
  If you want to know more about how this web app works or how to customize it then 
  <a href="demo4.html">read on&gt;&gt;&gt;</a>.
  </p>
  </section>
  
  </body>
  </document>
  
  
  
  
  1.1                  jakarta-lucene/xdocs/demo4.xml
  
  Index: demo4.xml
  ===================================================================
  <?xml version="1.0"?>
  <document>
  <properties>
  <author email="acoliver@apache.org">Andrew C. Oliver</author>
  <title>Jakarta Lucene - Basic Demo Sources Walkthrough</title>
  </properties>
  <body>
  
  <section name="About the Code">
  <p>
  In this section we walk through the sources behind the basic Lucene Web Application demo.  
  Where to find it, its parts, and their function.  This section is intended for Java developers
  wishing to understand how to use Jakarta Lucene in their applications or for those involved
  in deploying web applications based on Lucene.
  </p>
  </section>
  
  
  <section name="Location of the source (developers/deployers)">
  <p>
  Relative the directory created when you extracted Lucene or retreived it from CVS, you 
  should see a directory called "src" which in turn contains a directory called "jsp".
  This is the root for all of the Lucene web demo. 
  </p>
  <p>
  Within this directory you should see the index.jsp class.  Bring this up in vi or your 
  editor of choice.
  </p>
  </section>
  
  <section name="index.jsp (developers/deployers)">
  <p>
  This jsp page is pretty boring by itself.  All it does is include a header, display a form and 
  include a footer.  If you look at the form, it has two fields: query (where you enter your 
  search criteria) and maxresults where you specify the number of results per page.  If you look
  at the form tag, you'll notice it uses the get method as opposed to the post.  While this is 
  considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
  usefulness of being able to bookmark things like searches.  By the structure of this JSP it should
  be easy to customize it without even editing this particular file.  You could simply change the 
  header and footer.  Let's look at the header.jsp (located in the same directory) next.
  </p>
  </section>
  
  <section name="header.jsp (developers/deployers)">
  <p>
  The header is also very simple by itself.  The only thing it does is include the configuration.jsp
  (which you looked at in the last section of this guide) and set the title and a brief header.  This
  would be a good place to put your own custom HTML to "pretty" things up a bit.  We won't cover the 
  footer because all it does is display the footer and close your tags.  Let's look at the results.jsp,
  the meat of this application next.
  </p>
  </section>
  
  <section name="results.jsp (developers)">
  <p>
  The results.jsp had a lot more functionality.  Much of it is for paging the search results we'll not
  cover this as its commented well enough.  It does not peform any optimizations such as caching results, 
  etc. as that would make this a more complex example.  The first thing in this page is the actual imports
  for the Lucene classes and Lucene demo classes.  These classes are loaded from the jars included in the 
  WEB-INF/lib directory in the final war file.  
  </p>
  <p>
  You'll notice that this file includes the same header and footer as the "index.jsp".  From there the jsp
  constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp".  If there
  is an error of any kind in opening the index, it is diplayed ot the user and a boolean flag is set to tell 
  the rest of the sections of the jsp not to continue.
  </p>
  <p>
  From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
  number of results per page.  If the maximum results per page is not set or not valid then it and the 
  start index are set to default values.  If only the start index is invalid it is set to a default value.  If 
  the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
  or some form of browser malfunction).
  </p>
  <p>
  The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it 
  is passed to the QueryParser along with the criteria to construct a Query object.  You'll also notice the 
  string literal "contents" included.  This is to specify the search should include the  the contents and not 
  the title, url or some other field in the indexed documents.  If there is any error in constructing a Query 
  object an error is displayed to the user.
  </p>
  <p>
  In the next section of the jsp the IndexSearcher is asked to search given the query object.  the results are
  returned in a collection called "hits".  If the length property of the hits collection is 0 then an error 
  is displayed to the user and the error flag is set.
  </p>
  <p>
  Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
  about in the first walkthrough.  These objects contain "known" fields specific to their indexer (in this case 
  "IndexHTML" constructs a document with "url", "title" and "contents").  You'll notice that these results are paged
  but the search is repeated every time.  This is an area where optimization could improve performance for large 
  result sets.
  </p>
  </section>
  
  <section name="More sources (developers)">
  <p>
  There are additional sources used by the web app that were not specifically covered by either walkthrough.  For 
  example the HTML parser, the IndexHTML class and HTMLDocument class.  These are very similar to the classes 
  covered in the first example, however they have properties sepecific to parsing and indexing HTML.  This is 
  beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
  </p>
  </section>
  
  <section name="Where to go from here? (Everyone!)">
  <p>
  There are a number of things this demo doesn't do or doesn't do quite right.  For instance, you may 
  have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
  support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
  you'll have a broken link in your results.  If you want to index non-local files or have some other 
  needs this isn't supported, plus there may be security issues with running the indexing application from
  your webapps directory.  There are a number of things left for you the implementor or developer to do.
  </p>
  <p>
  In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
  but for now: this is where you begin and the search engine/indexer ends.  Lastly, one would assume you'd
  want to follow the above advice and customize the application to look a little more fancy than black on 
  white with "Lucene Template" at the top.  We'll see you on the Lucene Users' or Developers' mailing lists!
  </p>
  </section>
  
  <section name="When to contact the Author">
  <p>
  Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached).  First
  contact the <a href="http://jakarta.apache.org/site/mail.html">mailing lists</a>.  That being said feedback, 
  and modifications to this document and samples are ever so greatly appreciatedThey are just best sent to the 
  lists so that everyone can share in them.  Certainly you'll get the most help there as well.  
  Thanks for understanding.  
  </p>
  </section>
  
  </body>
  </document>
  
  
  
  
  1.5       +1 -0      jakarta-lucene/xdocs/stylesheets/project.xml
  
  Index: project.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/xdocs/stylesheets/project.xml,v
  retrieving revision 1.4
  retrieving revision 1.5
  diff -u -r1.4 -r1.5
  --- project.xml	2 Oct 2001 15:54:16 -0000	1.4
  +++ project.xml	26 Jan 2002 15:01:32 -0000	1.5
  @@ -15,6 +15,7 @@
   
       <menu name="Documentation">
           <item name="FAQ"               href="http://www.lucene.com/cgi-bin/faq/faqmanager.cgi" target="_blank"/>
  +	<item name="Getting Started"   href="/gettingstarted.html"/>
           <item name="Articles"          href="/resources.html"/>
           <item name="Javadoc"            href="/api/index.html"/>
       </menu>
  
  
  

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message