hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Moore (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (HTTPCORE-195) ChunkDecoder is overly sensitive to truncated chucks
Date Fri, 08 May 2009 21:00:45 GMT

     [ https://issues.apache.org/jira/browse/HTTPCORE-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Moore reopened HTTPCORE-195:
------------------------------------


This is a extraordinary lazy answer because:
1) We are not talking about a separate workaround to each site.
2) There is no logging at all of the details on why the chunk is bad (i.e. how many bytes
read, how many bytes are missing, http header information)
3) It is unreasonable to expect the user of the nio library to be able to control and be able
to solve the problem with other servers.
4) there are viable straight forward solutions.

In our case we only crawled 20 websites before we started running into this problem.

Possible solutions:

1)  a quiet configuration option ( log at debug level)
2)  the ability to do error processing on the retrieved text.

> ChunkDecoder is overly sensitive to truncated chucks
> ----------------------------------------------------
>
>                 Key: HTTPCORE-195
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-195
>             Project: HttpComponents HttpCore
>          Issue Type: Bug
>          Components: HttpCore NIO
>    Affects Versions: 4.0
>            Reporter: Patrick Moore
>            Priority: Critical
>
> Our server is webcrawling.
> We are frequently encountering this issue. We think this might be related to something
on the server that we are scanning. But that doesn't matter. We need to handle such cases
without exceptions. (From my perspective, such things should generate a debug message -- certainly
not an exception that ends processing and throws away the retrieved content! )
> http://stuftpizza.com/ seems to reliably result in this problem
> May be TransferEncoding? http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6
> Either way we need to be able to deal with issues on the other servers.
> {{{
> Date	Mon, 20 Apr 2009 03:56:45 GMT
> Server	Apache/2.2.3 (Red Hat)
> Accept-Ranges	bytes
> Connection	close
> Transfer-Encoding	chunked
> Content-Type	text/html
> '''Request Headers'''
> Host	stuftpizza.com
> User-Agent	Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.8) Gecko/2009032608
Firefox/3.0.8
> Accept	text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language	en-us,en;q=0.5
> Accept-Encoding	gzip,deflate
> Accept-Charset	ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Keep-Alive	300
> Connection	keep-alive
> Cookie	
> __utma=47358053.1237981682.1240199754.1240199754.1240199754.1; __utmb=47358053; __utmc=47358053;
__utmz
> =47358053.1240199754.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
> Cache-Control	max-age=0
> }}}
> {{{
> 20:51:08,768 INFO  [nioEventListener] Request http://stuftpizza.com/ failed with exception.
> org.apache.http.MalformedChunkCodingException: Truncated chunk
> 	at org.apache.http.impl.nio.codecs.ChunkDecoder.read(ChunkDecoder.java:203)
> 	at org.apache.http.nio.util.SimpleInputBuffer.consumeContent(SimpleInputBuffer.java:60)
> 	at org.apache.http.nio.entity.BufferingNHttpEntity.consumeContent(BufferingNHttpEntity.java:72)
> 	at org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:236)
> 	at org.apache.http.nio.protocol.BufferingHttpClientHandler.inputReady(BufferingHttpClientHandler.java:118)
> 	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:178)
> 	at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:146)
> 	at com.amplafi.iomanagement.http.UniversalIOEventDispatch.inputReady(UniversalIOEventDispatch.java:133)
> 	at $IOEventDispatch_120c19cd1c7.inputReady($IOEventDispatch_120c19cd1c7.java)
> 	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:153)
> 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:314)
> 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:294)
> 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:256)
> 	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:96)
> 	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:556)
> 	at java.lang.Thread.run(Thread.java:637)
> }}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message