nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Nutch Control via Java with no Command Line?
Date Thu, 12 May 2005 17:13:00 GMT
Joe Reger, Jr. wrote:

> In other words, I'd like to avoid using the command line and instead call
> the java classes directly on a scheduled or user-controlled basis from
> Tomcat.  From what I see in bin/nutch I should be able to replace the
> command:
>  
> bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
>  
> with something like:
>  
> net.nutch.tools.CrawlTool crawlTool = new net.nutch.tools.CrawlTool();
> String[] args = new String[7];
> args[0] = "urls";
> args[1] = "-dir";
> args[2] = "crawl.test";
> args[3] = "-depth";
> args[4] = "3";
> args[5] = ">&";
> args[6] = "crawl.log";
> crawlTool.main(args);
>  
> Is this possible?  Is this smart?  What sort of issues will arrise if I try
> to run everything from Tomcat/Java?

First of all, it's not only perfectly possible, it's actually how the 
CrawlTool itself is implemented - please take a look at CrawlTool.main ...

The issues... Well, you need to keep in mind that most Nutch processing 
tasks consume a lot of resources, so if you run a task in the same JVM 
instance as the whole app server, then you can exhaust some resource 
(file handles, heap space, cpu/io, etc) and starve other applications 
that run on the same JVM.


-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message