hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: Parsing of Link header elements containing query parameters
Date Fri, 11 Mar 2016 09:16:07 GMT
On Thu, 2016-03-10 at 23:10 -0500, Brent Putman wrote:
> Hi,
> I'm working with a REST API which returns a Link entity header to
> indicate "rel" links (previous, next, etc) for pagination over more
> results than are returned in a single call.  In their docs they
> specifically reference this very outdated (and non-standard) spec [1],
> but it seems to be quite similar to the more current RFC 5988 [2].
> 
> The individual URI values in the Link header value contain query
> parameters.  Here is the HC library wire trace of the entire header:
> 
> 2016-03-10 22:36:31.354 [DEBUG] : org.apache.http.wire: http-outgoing-0
> << "Link:
> <https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
> rel="current",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=2&per_page=10>;
> rel="next",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
> rel="first",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=559&per_page=10>;
> rel="last"[\r][\n]"
> 

The wisdom of passing around multiple URIs in HTTP headers seems
questionable to me. 

Besides the URI values are not enclosed in quote marks or properly
escaped, so no wonder the standard header element tokenizer fails to
parse them.

Oleg

> 
> When I attempt to extract this header from the HttpResponse and display
> the individual element values using code similar to:
> 
> Header linkHeader = httpResponse.getFirstHeader("Link");
> for (HeaderElement element : linkHeader.getElements()) {
>     System.out.println("Saw HeaderElement: " + element.toString());
>     System.out.println("HeaderElement name: " + element.getName());
>     System.out.println("HeaderElement value: " + element.getValue());
> }
> 
> 
> I'm seeing output for example:
> 
> Saw HeaderElement:
> <https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
> rel=current
> HeaderElement name:
> <https://georgetown.test.instructure.com/api/v1/accounts/self/users?page
> HeaderElement value: 1&per_page=10>
> 
> 
> So, it's splitting on the first '=' character to determine the element
> name vs value, which looks odd.  And there doesn't seem to be a way in
> the API to get the value of the HeaderElement minus the parameters.
> 
> Is this:
> 1) A bug in HttpClient's HeaderElement parsing?
> 2) A mistake on the part of the server sending these particular URL
> values (i.e. perhaps should be encoded in some way)?
> 3) Neither: Perhaps given knowledge of the specific header syntax and
> semantics, the name/value API is not appropriate for it, and I need to
> handle these values manually by for example:
>      A) Stitching the URI back together manually as the name + "=" + value
>      B) Splitting the HeaderElement#toString() on the semi-colon
> 
> #3 makes me nervous at the moment since I don't fully understand the
> issues at hand.
> 
> I'm trying to read through relevant HTTP specs to better understand the
> nuance of the header value syntax.  But I know there are people on the
> list who are knowledgeable on the specs and may have a quick answer, so
> wanted to pose the question in the meantime.
> 
> Thanks,
> Brent
> 
> [1] http://www.w3.org/Protocols/9707-link-header.html
> [2] https://tools.ietf.org/html/rfc5988
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message