hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Naegele" <jnaeg...@grierforensics.com>
Subject HttpAsyncClient bounded download size
Date Tue, 06 Dec 2016 16:21:38 GMT
Hi folks,

How can I limit the amount of data downloaded for a request executed by the HttpAsyncClient
and still process the response as "completed" in the registered FutureCallback? The use case
is a large scale web crawler that truncates resources deemed too large.

I started by limiting the amount of data read from the response entity's InputStream, however
this doesn't work with the default BasicAsyncResponseConsumer, because it uses the dynamically
expanding SimpleInputBuffer to download the entire response entity.

I implemented my own HttpAsyncResponseConsumer, similar to the BasicAsyncResponseConsumer,
and tried using IOControl to signal shutdown once the I've read maximum desired number of
bytes, however this triggers a ConnectionClosedException. This is undesirable because I can't
distinguish it from other causes of ConnectionClosedExceptions, and I want to treat "truncated"
responses as completed in the registered FutureCallback (where I post-process the response).

Is there another method of implementing my desired functionality?

Joe Naegele

To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org

View raw message