nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From k-team <>
Subject Re: Charset encoding
Date Wed, 18 May 2005 13:38:42 GMT
> Sometimes web pages do not identify the encoding the page is in.  In
> these cases, the client has to "guess" the encoding.  Nutch currently
> does not have a guessing algorithm, so if it encounters one of these
> pages, it just decodes the page using the
> parser.character.encoding.default parameter.

mmm, we have checked that search.jsp has pageEncoding set to UTF-8 and
then we have set parser.character.encoding to UTF-8

for example when searching this string 'perchè'  we obtain in the url this:


i.e. two urlencoded characters... however it should be %E8, the 'è'.

thanks for your support


View raw message