nutch-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "Features" by MarkyGoldstein
Date Wed, 16 Aug 2006 14:09:45 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by MarkyGoldstein:
http://wiki.apache.org/nutch/Features

------------------------------------------------------------------------------
  
  (Please reformat this text and divide into feature lists, questions and questions &
answers). 
  
- ==Features==
+ == Features ==
  
- ==Questions and Answers==
+ == Questions and Answers ==
- 
- ==Questions==
- 
  
   *What kind of searches does Nutch support? (quoted, nested, truncation, wildcarding [and
where], Boolean),
      * "...." (phrase search?), + (what is this for?), - (negation) and fieldname:term. 
No "AND" or "OR".  The and-logic is implied.
+ 
   *Is stemming an option?
-     * According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not
use stemming or term aliasing of any kind.  Search engines have not historically done much
stemming, but it is a question that comes up regularly." -- page 329
+     * According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not
use stemming or term aliasing of any kind.  Search engines have not historically done much

+ stemming, but it is a question that comes up regularly." -- page 329
+ 
   *What kind of stemming does Nutch use? (and can you add exceptions/changes?)
      * See previous answer :)
+ 
   *Does Nutch support Boolean operators? (can you use Google-like plus or minus or are you
stuck with 1990s terms?)
      * No
-  *Does Nutch support weighted field searching, synonym support?
-  *What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing,
spell-check support, thesauri support, fielded searching,  rank-by-reputation?)
  
   *How does the search engine handle punctuation and special characters? (and what's configurable?)
      * They are treated like a space.
+ 
   *Which document formats are supported?
    * Guessing from the names of the available parser plugins, this is probably it.  However,
only the plain text and HTML are enabled by default.  Edit conf/nutch-site.xml and change
the value of plugin.includes property to include the plugins for the document types that you
want Nutch to handle:
     * Plain Text (plugin: parse-text)
@@ -38, +38 @@

       title, artist, album, comments, etc. The useful information needed to search mp3s)
     * ZIP (?) This seems to expand the zip of plain text files and return the concatenated
text. (parse-zip)
  
+ 
+ == Questions without Answers ==
+ 
+  *Does Nutch support weighted field searching, synonym support?
+ 
+  *What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing,
spell-check support, thesauri support, fielded searching,  rank-by-reputation?)
+ 
   *What post-coordination options are available? (hey Karen, what does this mean?)
  
   *How easy is Nutch to configure?
+ 
   *How transparent is its configuration to a working organization: does it require geeky
command line stuff, or can a knowledgable manager enter a web or software interface to view
or modify settings?
  
   * How are results sorted?
+ 
   * Does Nutch support deduping?
+ 
   * Can one tinker with relevance algoritms?
+ 
   * Are there ranking overrides?
  

Mime
View raw message