lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Lucene or Nutch???
Date Fri, 22 Feb 2008 00:58:53 GMT

"Lucene Java" and "Nutch" are both subprojects of "Lucene"

from http://lucene.apache.org/

  Lucene Java, our flagship sub-project, provides Java-based indexing and 
  search technology. 

  Nutch builds on Lucene Java to provide web search application software.

if you wnat to develop your own java application that has integrated 
indexing logic, using the "Lucene Java" *library* is probably right for 
you.  if you want an *application* which can crawl various types of 
documents and builds a Lucene index, then Nutch is probably rightfor you.  
even if you use Nutch, you can also use the underlying Lucene Java library 
to directly access the indexes it builds.

: I am new to lucene and nutch.
: I am doing a project on an archiving web portal which allow individual user
: to index document (from file system) and to crawl website and RSS feed for
: indexing. 
: 
: Looking at the above requirement. I thought lucene is able to achieve it,
: however I found out that lucene does not have a crawler to crawl url. 
: 
: Then I look came across Nutch = perfect for my latter requirement to fetch
: website and RSS feed. I realise another thing from nutch it allow me to
: crawl my file system as well... 
: 
: Well then in my case, I guess I should be using API from nutch instead of
: Lucene? 
: >From another discussion on Nabble:  
: http://www.nabble.com/Integration-of-Nutch-td12016441.html#a12040333
: 
: there is this advice to use lucene to index the same index file that nutch
: have created. But I thought that nutch is using a webdb to store the return
: crawl result? anyway from the threat mention above... why would one use
: lucene if nutch can perform all the local file system and web index and
: search function



-Hoss


Mime
View raw message