nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jérôme Charron" <>
Subject Re: Content-Type inconsistency?
Date Thu, 27 Apr 2006 20:52:26 GMT
> I'm not sure if that is the right thing.
> If the site administrator did a poort job and a wrong media type is
> advertized, it's the site
> problem and Nutch shouldn't be fixing it, in my opinion.  Those sites
> would
> not work properly with the browsers any way, and Nutch doesn't need to
> work properly
> except that it should protect itself from crashing.  I tried to visit your
> page with
> IE and Firefox, and both faithfully trusted the media type as advertised
> by the server, and
> asked me if I want to open it with WinZip or save it; there was no option
> to open it as an HTML.
> Why should Nutch treat it as HTML?

Simply because it is a HTML file, with a strange name, of course, but it is
a HTML file.
My example is a kind of "caricature". But some more real case could be : a
HTML file with a text/plain content-type, or with an text/xml
Finaly it is a good news that Nutch seems to be more "intelligent" on
content-type guessing than Firefox or IE, no?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message