nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <>
Subject Re: ASP Parser
Date Tue, 10 May 2005 19:20:29 GMT
> I've recently just installed and configured Nutch from source. From
> what I've read by default, Nutch will parse text and html based
> documents only. I have a site I'm trying to crawl which is all asp
> pages. I put the asp mime type in the mime-type.xml document. What
> else do I need to do in order for Nutch to crawl asp pages?

Corrects me if I'm wrong, but ASP is like JSP: a page that is interpreted on 
the server side and generates any type of document (mainly some pure html).
So, you don't need to add ASP support on Nutch, since you ASP pages 
certainly generate some HTML code.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message