incubator-wink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baram, Eliezer" <eba...@hp.com>
Subject FW: Tolerance to malformed media types in Wink client
Date Mon, 13 Sep 2010 08:18:22 GMT
And here is the mail he tried to post

---------- Forwarded message ----------
From: Steve Miller <stvmllr78@gmail.com<mailto:stvmllr78@gmail.com>>
Date: Mon, Sep 12, 2010 at 2:15 AM
Subject: Tolerance to malformed media types in Wink client
To: wink-user@incubator.apache.org<mailto:wink-user@incubator.apache.org>

Hi
I created a crawler using the Apache wink client, but I found out that wink client is not
tolerant to malformed media types, even if the malformed part is only a media type parameter.
Unfortunately there are a lot of those in the internet.
When wink receives such media type it throw exception with the message: 'java.lang.IllegalArgumentException
... Verify that the format is like "type/subtype".'
I think it would be good if wink can be more tolerant for such media types, especially since
they are common. It will surly easy my time :-)

Here are examples of the media types that cause the problem and their source. This is a sample,
the sites list is longer, but the media type patterns return on themselves.

URL:   http://www.aol.com/   (and all aol sites around the globe)
Media Type: text/html;;charset=utf-8

URL: http://www.plugrush.com/
Media Type: text/html; charset: UTF-8

URL: http://www.torrentleech.org/
Media Type: text/html; charset=

URL: http://www.comingsoon.net/
Media Type: text/html; $str_charset; charset=ISO-8859-1

URL: http://www.globalsources.com/
Media Type: text/html; UTF-8;charset=ISO-8859-1

URL: http://dic.academic.ru/
Media Type: text/html; utf-8

URL: http://www.warnerbros.com/
Media Type: text/html; UTF-8;charset=UTF-8

Thanks,
Steve










Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message