Return-Path: Delivered-To: apmail-lucene-nutch-commits-archive@www.apache.org Received: (qmail 68481 invoked from network); 16 Aug 2006 14:10:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Aug 2006 14:10:08 -0000 Received: (qmail 62690 invoked by uid 500); 16 Aug 2006 14:10:08 -0000 Delivered-To: apmail-lucene-nutch-commits-archive@lucene.apache.org Received: (qmail 62668 invoked by uid 500); 16 Aug 2006 14:10:08 -0000 Mailing-List: contact nutch-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-dev@lucene.apache.org Delivered-To: mailing list nutch-commits@lucene.apache.org Received: (qmail 62650 invoked by uid 500); 16 Aug 2006 14:10:08 -0000 Delivered-To: apmail-incubator-nutch-commits@incubator.apache.org Received: (qmail 62647 invoked by uid 99); 16 Aug 2006 14:10:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Aug 2006 07:10:08 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Aug 2006 07:10:06 -0700 Received: from ajax.apache.org (localhost [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 1F446D4972 for ; Wed, 16 Aug 2006 15:09:45 +0100 (BST) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: nutch-commits@incubator.apache.org Date: Wed, 16 Aug 2006 14:09:45 -0000 Message-ID: <20060816140945.3668.31733@ajax.apache.org> Subject: [Nutch Wiki] Update of "Features" by MarkyGoldstein X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The following page has been changed by MarkyGoldstein: http://wiki.apache.org/nutch/Features ------------------------------------------------------------------------------ (Please reformat this text and divide into feature lists, questions and questions & answers). - ==Features== + == Features == - ==Questions and Answers== + == Questions and Answers == - - ==Questions== - *What kind of searches does Nutch support? (quoted, nested, truncation, wildcarding [and where], Boolean), * "...." (phrase search?), + (what is this for?), - (negation) and fieldname:term. No "AND" or "OR". The and-logic is implied. + *Is stemming an option? - * According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not use stemming or term aliasing of any kind. Search engines have not historically done much stemming, but it is a question that comes up regularly." -- page 329 + * According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not use stemming or term aliasing of any kind. Search engines have not historically done much + stemming, but it is a question that comes up regularly." -- page 329 + *What kind of stemming does Nutch use? (and can you add exceptions/changes?) * See previous answer :) + *Does Nutch support Boolean operators? (can you use Google-like plus or minus or are you stuck with 1990s terms?) * No - *Does Nutch support weighted field searching, synonym support? - *What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing, spell-check support, thesauri support, fielded searching, rank-by-reputation?) *How does the search engine handle punctuation and special characters? (and what's configurable?) * They are treated like a space. + *Which document formats are supported? * Guessing from the names of the available parser plugins, this is probably it. However, only the plain text and HTML are enabled by default. Edit conf/nutch-site.xml and change the value of plugin.includes property to include the plugins for the document types that you want Nutch to handle: * Plain Text (plugin: parse-text) @@ -38, +38 @@ title, artist, album, comments, etc. The useful information needed to search mp3s) * ZIP (?) This seems to expand the zip of plain text files and return the concatenated text. (parse-zip) + + == Questions without Answers == + + *Does Nutch support weighted field searching, synonym support? + + *What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing, spell-check support, thesauri support, fielded searching, rank-by-reputation?) + *What post-coordination options are available? (hey Karen, what does this mean?) *How easy is Nutch to configure? + *How transparent is its configuration to a working organization: does it require geeky command line stuff, or can a knowledgable manager enter a web or software interface to view or modify settings? * How are results sorted? + * Does Nutch support deduping? + * Can one tinker with relevance algoritms? + * Are there ranking overrides?