nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-nu...@tropo.com>
Subject Re: [Nutch-general] ASP Parser
Date Tue, 10 May 2005 19:45:46 GMT
Seth Taylor wrote:

> I've recently just installed and configured Nutch from source.  From
> what I've read by default, Nutch will parse text and html based
> documents only.  I have a site I'm trying to crawl which is all asp
> pages.  I put the asp mime type in the mime-type.xml document.  What
> else do I need to do in order for Nutch to crawl asp pages?

Probably you need to check out the URL filter (conf/crawl-urlfilter.txt) 
and make sure the pages are not rejected. Note that there might be a 
pattern that rejects argument to the URL so you might want to disable 
that if the pages take args.

I would think that there is no ASP MIME type per-se -- surely the 
average ASP page returns HTML documents?!

> 
>  
> 
> Thanks,
> 
> Seth
> 
>  
> 
> staylor@hhgregg.com
> 
> 


Mime
View raw message