lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Distributed Lucene..
Date Tue, 07 Mar 2006 09:41:08 GMT
Prasenjit Mukherjee wrote:
> I think nutch has a distributed lucene implementation. I could have 
> used nutch straightaway, but I have a different crawler, and also dont 
> want to use NDFS(which is used by nutch) . What I have proposed 
> earlier is basically based on mapReduce paradigm, which is used by 
> nutch as well.
> It would be nice to get some articles specifically detailing out  the 
> distributed architecture used in nutch.

A few comments:

* you can use your own crawler, and then only write some glue code to 
convert the output of that crawler to the format that Nutch uses.

* Nutch can be run in a so called "local" mode, without using NDFS

* the core map-reduce and I/O functionality has been split to its own 
project, Hadoop, where the development is taking place at a furious rate 
;-) This code is completely independent of Nutch or Lucene. You can 
implement your own data processing using this framework.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message