lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From acoli...@apache.org
Subject cvs commit: jakarta-lucene/xdocs luceneplan.xml
Date Sun, 24 Feb 2002 15:58:42 GMT
acoliver    02/02/24 07:58:41

  Modified:    docs     luceneplan.html
               xdocs    luceneplan.xml
  Log:
  implemented suggestions by Marc Tucker
  
  Revision  Changes    Path
  1.2       +39 -10    jakarta-lucene/docs/luceneplan.html
  
  Index: luceneplan.html
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/docs/luceneplan.html,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- luceneplan.html	23 Feb 2002 22:03:39 -0000	1.1
  +++ luceneplan.html	24 Feb 2002 15:58:41 -0000	1.2
  @@ -199,24 +199,24 @@
                                                   <table border="0" cellspacing="0" cellpadding="2"
width="100%">
         <tr><td bgcolor="#525D76">
           <font color="#ffffff" face="arial,helvetica,sanserif">
  -          <a name="Indexers"><strong>Indexers</strong></a>
  +          <a name="Crawlers"><strong>Crawlers</strong></a>
           </font>
         </td></tr>
         <tr><td>
           <blockquote>
                                       <p>
  -                        Indexers are standard crawlers.  They go crawl a file
  +                        Crawlers are data source executable code.  They crawl a file
                           system, ftp site, web site, etc. to create the index.
  -                        These standard indexers may not make ALL of Lucene's
  +                        These standard crawlers may not make ALL of Lucene's
                           functionality available, though they should be able to
                           make most of it available through configuration.
                   </p>
                                                   <p>
  -                        <b> Abstract Indexer </b>
  +                        <b> Abstract Crawler </b>
                   </p>
                                                   <p>
  -                                The Abstract indexer is basically the parent for all
  -                                Indexer classes.  It provides implementation for the
  +                                The AbstractCrawler is basically the parent for all
  +                                Crawler classes.  It provides implementation for the
                                   following functions/properties:
                           </p>
                                                   <ul>
  @@ -264,6 +264,35 @@
                                           0 - Long.MAX_VALUE.
                                   </li>
                                   <li>
  +                                        SleeptimeBetweenCalls - can be used to 
  +                                        avoid flooding a machine with too many 
  +                                        requests
  +                                </li>
  +                                <li>
  +                                        RequestTimeout - kill the crawler
  +                                        request after the specified period of
  +                                        inactivity.
  +                                </li>
  +                                <li>
  +                                        IncludeFilter - include only items 
  +                                        matching filter.  (can occur mulitple
  +                                        times)
  +                                </li>
  +                                <li>
  +                                        ExcludeFilter - exclude only items 
  +                                        matching filter.  (can occur multiple
  +                                        times)
  +                                </li>
  +                                <li>
  +                                        MaxItems - stops indexing after x
  +                                        documents have been indexed.
  +                                </li>
  +                                <li>
  +                                        MaxMegs - stops indexing after x megs
  +                                        have been indexed..  (should this be in
  +                                        specific crawlers?)
  +                                </li>
  +                                <li>
                                           properties - in addition to the settings
                                           (probably from the command line) read
                                           this properties file and get them from
  @@ -275,18 +304,18 @@
                                   </li>
                           </ul>
                                                   <p>
  -                              <b>FileSystemIndexer</b>
  +                              <b>FileSystemCrawler</b>
                           </p>
                                                   <p>
  -                                This should extend the AbstractIndexer and
  +                                This should extend the AbstractCrawler and
                                   support any addtional options required for a
                                   filesystem index.
                           </p>
                                                   <p>
  -			      <b>HTTP Indexer </b>
  +			      <b>HTTP Crawler </b>
                           </p>
                                                   <p>
  -                                Supports the AbstractIndexer options as well as:      
                         
  +                                Supports the AbstractCrawler options as well as:      
                         
                           </p>
                                                   <ul>
                                   <li>
  
  
  
  1.2       +39 -10    jakarta-lucene/xdocs/luceneplan.xml
  
  Index: luceneplan.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/xdocs/luceneplan.xml,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- luceneplan.xml	23 Feb 2002 22:02:55 -0000	1.1
  +++ luceneplan.xml	24 Feb 2002 15:58:41 -0000	1.2
  @@ -91,21 +91,21 @@
                           </li>
                   </ul>
           </section>
  -        <section name="Indexers">
  +        <section name="Crawlers">
                   <p>
  -                        Indexers are standard crawlers.  They go crawl a file
  +                        Crawlers are data source executable code.  They crawl a file
                           system, ftp site, web site, etc. to create the index.
  -                        These standard indexers may not make ALL of Lucene's
  +                        These standard crawlers may not make ALL of Lucene's
                           functionality available, though they should be able to
                           make most of it available through configuration.
                   </p>
                   <!--<section name="AbstractIndexer">-->
                   <p>
  -                        <b> Abstract Indexer </b>
  +                        <b> Abstract Crawler </b>
                   </p>
                           <p>
  -                                The Abstract indexer is basically the parent for all
  -                                Indexer classes.  It provides implementation for the
  +                                The AbstractCrawler is basically the parent for all
  +                                Crawler classes.  It provides implementation for the
                                   following functions/properties:
                           </p>
                           <ul>
  @@ -153,6 +153,35 @@
                                           0 - Long.MAX_VALUE.
                                   </li>
                                   <li>
  +                                        SleeptimeBetweenCalls - can be used to 
  +                                        avoid flooding a machine with too many 
  +                                        requests
  +                                </li>
  +                                <li>
  +                                        RequestTimeout - kill the crawler
  +                                        request after the specified period of
  +                                        inactivity.
  +                                </li>
  +                                <li>
  +                                        IncludeFilter - include only items 
  +                                        matching filter.  (can occur mulitple
  +                                        times)
  +                                </li>
  +                                <li>
  +                                        ExcludeFilter - exclude only items 
  +                                        matching filter.  (can occur multiple
  +                                        times)
  +                                </li>
  +                                <li>
  +                                        MaxItems - stops indexing after x
  +                                        documents have been indexed.
  +                                </li>
  +                                <li>
  +                                        MaxMegs - stops indexing after x megs
  +                                        have been indexed..  (should this be in
  +                                        specific crawlers?)
  +                                </li>
  +                                <li>
                                           properties - in addition to the settings
                                           (probably from the command line) read
                                           this properties file and get them from
  @@ -166,20 +195,20 @@
                   <!--</section>-->
                   <!--<s2 title="FileSystemIndexer">-->
                           <p>
  -                              <b>FileSystemIndexer</b>
  +                              <b>FileSystemCrawler</b>
                           </p>
                           <p>
  -                                This should extend the AbstractIndexer and
  +                                This should extend the AbstractCrawler and
                                   support any addtional options required for a
                                   filesystem index.
                           </p>
                   <!--</s2>-->
                   <!--<s2 title="HTTPIndexer">-->
                           <p>
  -			      <b>HTTP Indexer </b>
  +			      <b>HTTP Crawler </b>
                           </p>
                           <p>
  -                                Supports the AbstractIndexer options as well as:      
                         
  +                                Supports the AbstractCrawler options as well as:      
                         
                           </p>
                           <ul>
                                   <li>
  
  
  

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message