nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Manfield <>
Subject using nutch just for crawling, not indexing?
Date Mon, 02 May 2005 19:31:46 GMT
We would like to use nutch just for crawling, and then index the crawled database into our
proprietory datastore/index. How do we go about this? I see that nutch is a shell script,
so it is possible to just crawl. Once it crawls, I suppose the crawled data is dumped into
webdb. Are there exposed APIs to extract the data from webdb? 
One more catch -- our company is a .NET shop :((, so we would like to use C# to read the data
of the fetched/crawled pages for further indexing.
Any plans to have nutch for .NET (like dotLucene)?

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message