hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Porter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HTTPCLIENT-1486) Quirky Behavior in URIUtils leads to Improper Request Execution
Date Tue, 18 Mar 2014 17:35:44 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939511#comment-13939511
] 

William Porter commented on HTTPCLIENT-1486:
--------------------------------------------

I am also of the opinion that implementing a more lenient URI parse is not what we want. 
I think the issue is that the current URI parsing is actually too lenient.  The internal java.net.URI
representation is unable to parse a host and port from that URI, but the org.apache.http.client.utils.URIUtils
extractHost() has custom parsing logic that leads to this scenario.  Rather than parsing a
host and port through custom means, maybe an exception should be thrown.

This method is already parsing the URI more than once in this code (in URIUtils.java extractHost(final
URI uri)):

{code}
298            String host = uri.getHost();
299            if (host == null) { // normal parse failed; let's do it ourselves
300                // authority does not seem to care about the valid character-set for host
names
301                host = uri.getAuthority();
302                if (host != null) {
303                    // Strip off any leading user credentials
304                    final int at = host.indexOf('@');
305                    if (at >= 0) {
306                        if (host.length() > at+1 ) {
307                            host = host.substring(at+1);
308                        } else {
309                            host = null; // @ on its own
310                        }
311                    }
312                    // Extract the port suffix, if present
313                    if (host != null) {
314                        final int colon = host.indexOf(':');
315                        if (colon >= 0) {
316                            final int pos = colon + 1;
317                            int len = 0;
318                            for (int i = pos; i < host.length(); i++) {
319                                if (Character.isDigit(host.charAt(i))) {
320                                    len++;
321                                } else {
322                                    break;
323                                }
324                            }
325                            if (len > 0) {
326                                try {
327                                    port = Integer.parseInt(host.substring(pos, pos + len));
328                                } catch (final NumberFormatException ex) {
329                                }
330                            }
331                            host = host.substring(0, colon);
332                        }
333                    }
334                }
335            }
...
{code}

> Quirky Behavior in URIUtils leads to Improper Request Execution
> ---------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1486
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1486
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.3.3
>            Reporter: William Porter
>            Priority: Minor
>
> While executing a HttpUriRequest with a ClosableHttpClient, malformed URIs can lead to
HTTP requests being executed for unexpected resources.  The root issue is in the extractHost()
method in URIUtils, and is demonstracted by the following example.
> {code:title=Main.java|borderStyle=solid}
> import java.io.IOException;
> import java.net.URI;
> import java.net.URISyntaxException;
> import org.apache.http.HttpHost;
> import org.apache.http.HttpResponse;
> import org.apache.http.client.ClientProtocolException;
> import org.apache.http.client.HttpClient;
> import org.apache.http.client.methods.HttpGet;
> import org.apache.http.client.methods.HttpUriRequest;
> import org.apache.http.client.utils.URIUtils;
> import org.apache.http.impl.client.HttpClientBuilder;
> import org.apache.log4j.BasicConfigurator;
> import org.junit.Assert;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> public class Main {
> 	
> 	private static final Logger LOG = LoggerFactory.getLogger(Main.class);
> 	public static void main(String [] args) {
> 		
> 		// Set up Log4J logging
> 		BasicConfigurator.configure();
> 		
> 		try {
> 			
> 			// The following is a strange URI string that is possibly a typo that
> 			// doesn't include the / between the authority and the 'intended' path
> 			final String strangeUriString = "http://www.example.com:80somepath/someresource.html";
> 			// Whereas it doesn't neccesarily seem like strange behavior to resolve the
> 			// host and port as www.example.com and 80 from the authority, it can have unintended
> 			// consequences at higher levels of indirection
> 			Assert.assertEquals(new HttpHost("www.example.com", 80), URIUtils.extractHost(new
URI(strangeUriString)));
> 			
> 			// Now we construct a request with the strange URI String
> 			HttpUriRequest request = new HttpGet(strangeUriString);
> 			
> 			// We create a CloseableHttpClient to execute the request
> 			final HttpClientBuilder builder = HttpClientBuilder.create();
> 			HttpClient client = builder.build();
> 			
> 			// Here, the request is executed, but is actually a GET /someresource.html
> 			// on www.example.com:80 since part of the intended path was considered part 
> 			// of the authority by the URI class, but disregarded by URIUtils
> 			final HttpResponse response = client.execute(request);
> 			LOG.info("Response: {}", response.getStatusLine().toString());
> 			
> 			
> 		} catch (final URISyntaxException e) {
> 			LOG.error("UriSyntaxException: {}", e.getMessage());
> 		} catch (final ClientProtocolException e) {
> 			LOG.error("ClientProtocolException: {}", e.getMessage());
> 		} catch (final IOException e) {
> 			LOG.error("IOException: {}", e.getMessage());
> 		}
> 		
> 	}
> }
> {code}
> This bug may be introduced by the fix for https://issues.apache.org/jira/browse/HTTPCLIENT-1166.
 It might be advantageous to throw an exception in this case rather than be lenient with the
host and port parsing, but further discussion might be merited based on the comments in the
aforementioned issue. 
> Here is some debug output to show the request is actually a GET /someresource.html
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "GET /someresource.html
HTTP/1.1[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Host: www.example.com:80[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "User-Agent: Apache-HttpClient/4.3.3
(java 1.5)[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire  - http-outgoing-0 >> "[\r][\n]"



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message