nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From EM <emili...@cpuedge.com>
Subject Re: [Nutch-general] using nutch just for crawling, not indexing?
Date Tue, 10 May 2005 05:56:39 GMT
I don't recall the exact command, but you can use the 'inject' command
to inject an url as a starting point.

Zhou LiBing wrote:

>hi
>  I have a problem about the nutch crawler, How can I crawling the www 
>according to one or serveral specified URL? because´╝ędon't want to use the 
>DMOZ data.
>    
>
> On 5/3/05, Jason Manfield <rarish911@yahoo.com> wrote: 
>  
>
>>We would like to use nutch just for crawling, and then index the crawled 
>>database into our proprietory datastore/index. How do we go about this? I 
>>see that nutch is a shell script, so it is possible to just crawl. Once it 
>>crawls, I suppose the crawled data is dumped into webdb. Are there exposed 
>>APIs to extract the data from webdb?
>>
>>One more catch -- our company is a .NET shop :((, so we would like to use 
>>C# to read the data of the fetched/crawled pages for further indexing.
>>
>>Ideas/suggestions?
>>
>>Any plans to have nutch for .NET (like dotLucene)?
>>
>>__________________________________________________
>>Do You Yahoo!?
>>Tired of spam? Yahoo! Mail has the best spam protection around
>>http://mail.yahoo.com
>>
>>    
>>
>
>
>
>  
>

Mime
View raw message