hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amoldavsky <assaf.moldav...@gmail.com>
Subject HttpClient 4.0 encoding madness
Date Thu, 28 Jan 2010 04:24:12 GMT

Hi

I have coded a simple file downloader using HttpClient 4.0.
It works fine but there is something wrong with the String encoding or the
buffer stream. The problem is that there are long sequences of "NULL" (ANSI
code 00) through out the final file, like this:
http://old.nabble.com/file/p27350930/httpclient_error01.jpg 
http://old.nabble.com/file/p27350930/httpclient_error02.jpg 

Here is the main code:

public String getChunk(String url, int bufferSize) throws
HTTPClientException
  {
    if(!chunkedStarted)
    {
      chunkedIns = getInputStream(url);
      chunkedStarted = true;
    }
    
    byte[] tmp = new byte[bufferSize];
    try
    {
      if(chunkedIns.read(tmp) != -1)
      {
        return new String(tmp);
      }
      else
      {
        finish();
        return null;
      }
    }
    catch(IOException e)
    {
      HTTPClientException e2 = new HTTPClientException(e.getMessage());
      e2.setStackTrace(e.getStackTrace());
      throw e2;
    }
  }
  
  public void finish()
  {
    // do some cleaning
  }

   private InputStream getInputStream(String url) throws HTTPClientException
  {
    InputStream instream = null;
    
    httpClient = new DefaultHttpClient();
    httpClient.getParams().setParameter("http.useragent", AGENT_NAME);
    
    HttpGet httpGet = new HttpGet(url);
    HttpResponse response = null;
    
    try
    {
      response = httpClient.execute(httpGet);
      HttpEntity entity = response.getEntity();
    
      if(entity != null) 
      {
        instream = entity.getContent();
      }
    }
    catch(ClientProtocolException e)
    {
      HTTPClientException e2 = new HTTPClientException(e.getMessage());
      e2.setStackTrace(e.getStackTrace());
      throw e2;
    }
    catch(IOException e)
    {
      HTTPClientException e2 = new HTTPClientException(e.getMessage());
      e2.setStackTrace(e.getStackTrace());
      throw e2;
    }
    
    return instream;
  }

getChuck and getInputStream can basically be one method but I just have the
need to split them for internal conveniece, that does not change the
funtionality as a whole.

It seems like either the conversion from bytes to string is a problem:
return new String(tmp);

or that the buffer is not getting filled to the end. The latter could not be
possible because the files are ~30MB each and the buffer size is 2Kb.

I have attached the file, it's a CSV (shortened to ~6KB), note that long
white space between some of the URLs, if you just remove it, the URL makes
sense.
http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv 

Where can this white space come (null) from??

thank!
-- 
View this message in context: http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27350930.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message