nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <ogjunk-nu...@yahoo.com>
Subject Re: [Nutch-general] using nutch just for crawling, not indexing?
Date Mon, 02 May 2005 21:28:31 GMT
Jason - this is perfectly doable -- I do this for my social bookmarking
project, Simpy.com 

I think people tend to run Nutch using the nutch shell script that
comes with Nutch, but you can really call the Fetcher Java class
directly and programmatically yourself, as it has the main method.  You
can do the same with the SegmentMergeTool.  So, if you can write a Java
app, just call Nutch's Java classes the same way that the shell script
does.

I can't help you with reading Nutch's files with C#, but the source is
there, so you should be able to write file readers in C#.

Otis
____________________________________________________________________
Simpy -- simpy.com -- tags, social bookmarks, personal search engine



--- Jason Manfield <rarish911@yahoo.com> wrote:
> We would like to use nutch just for crawling, and then index the
> crawled database into our proprietory datastore/index. How do we go
> about this? I see that nutch is a shell script, so it is possible to
> just crawl. Once it crawls, I suppose the crawled data is dumped into
> webdb. Are there exposed APIs to extract the data from webdb? 
>  
> One more catch -- our company is a .NET shop :((, so we would like to
> use C# to read the data of the fetched/crawled pages for further
> indexing.
>  
> Ideas/suggestions?
>  
> Any plans to have nutch for .NET (like dotLucene)?
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 

Mime
View raw message