nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhou LiBing <zhoulib...@gmail.com>
Subject Re: [Nutch-general] using nutch just for crawling, not indexing?
Date Tue, 10 May 2005 01:44:50 GMT
hi
  I have a problem about the nutch crawler, How can I crawling the www 
according to one or serveral specified URL? because´╝ędon't want to use the 
DMOZ data.
    

 On 5/3/05, Jason Manfield <rarish911@yahoo.com> wrote: 
> 
> We would like to use nutch just for crawling, and then index the crawled 
> database into our proprietory datastore/index. How do we go about this? I 
> see that nutch is a shell script, so it is possible to just crawl. Once it 
> crawls, I suppose the crawled data is dumped into webdb. Are there exposed 
> APIs to extract the data from webdb?
> 
> One more catch -- our company is a .NET shop :((, so we would like to use 
> C# to read the data of the fetched/crawled pages for further indexing.
> 
> Ideas/suggestions?
> 
> Any plans to have nutch for .NET (like dotLucene)?
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> 



-- 
---Letter From your friend Blue at HUST CGCL---

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message