nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Trivial Update of "NutchTutorial" by LewisJohnMcgibbney
Date Thu, 08 Dec 2011 13:26:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchTutorial" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=55&rev2=56

  == Introduction ==
  Apache Nutch is an open source Web crawler written in Java. By using it, we can find Web
page hyperlinks in an automated manner, reduce lots of maintenance work, for example checking
broken links, and create a copy of all the visited pages for searching over. That’s where
Apache Solr comes in. Solr is an open source full text search framework, with Solr we can
search the visited pages from Nutch. Luckily, integration between Nutch and Solr is pretty
straightforward as explained below.
  
- Apache Nutch release 1.3 has Solr integration embedded, greatly simplifying Nutch-Solr integration.
It also removes the legacy dependence upon both Apache Tomcat for running the old Nutch Web
Application and upon Apache Lucene for indexing. Just download a 1.3 binary release from [[http://www.apache.org/dyn/closer.cgi/nutch/|here]].
+ Apache Nutch supports Solr out-the-box, greatly simplifying Nutch-Solr integration. It also
removes the legacy dependence upon both Apache Tomcat for running the old Nutch Web Application
and upon Apache Lucene for indexing. Just download a binary release from [[http://www.apache.org/dyn/closer.cgi/nutch/|here]].
  
  == Table of Contents ==
  <<TableOfContents(3)>>
  
  == Steps ==
  == 1 Setup Nutch from binary distribution ==
-  * Unzip your binary Nutch package to `$HOME/nutch-1.3`
+  * Unzip your binary Nutch package to `$HOME/nutch-1.X`
-  * `cd $HOME/nutch-1.3/runtime/local`
+  * `cd $HOME/nutch-1.X/runtime/local`
  
  From now on, we are going to use `${NUTCH_RUNTIME_HOME}` to refer to the current directory.
  

Mime
View raw message