hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jakarta-httpclient Wiki] Update of "GuidedTourOfHttpCore" by RolandWeber
Date Sat, 14 Jul 2007 00:14:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jakarta-httpclient Wiki" for change
notification.

The following page has been changed by RolandWeber:
http://wiki.apache.org/jakarta-httpclient/GuidedTourOfHttpCore

The comment on the change is:
overview documentation of HttpCore-main

New page:
## pragma section-numbers 2
= A Guided Tour of HttpCore (module-main) =

----
[[TableOfContents]]
----

== Welcome ==

Welcome, visitors, to this guided tour of [wiki:Self:HttpComponents HttpCore].
I am your tour guide.
If you could please come a little closer and gather around me,
so I don't have to shout? Thank you, that's much better.
I'm about to give you a short introduction, and then we'll visit
some interesting places in !HttpCore so you can see how it works.
Whenever you've got a question, feel free to ask.
That's what I'm here for, to answer your questions.

As you probably know, HTTP is a protocol for exchanging messages
between a client and a server. It's in widespread use, and it
typically is running on top of plain TCP/IP or secure TLS/SSL sockets.
[[BR]]
Here at [http://www.apache.org/ Apache], there is an implementation
of the client side of that protocol called the Commons HttpClient.
Informally, we also call it "the 3.x codebase" or simply "the old code".
It proved quite useful to a lot of people, but the old code has severe
limitations in it's design.
For example, there is a class called {{{HttpMethodBase}}}.
It represents a request and a response at the same time,
and it also implements logic for processing both.
This kind of design where different things are crammed together
in a single place makes it really hard to maintain or extend the code.

Therefore we started a new successor project called HttpComponents.
Based on the experience gained with the old code, it implements the
HTTP protocol with a new approach. Above all, there are several modules
dealing with different aspects of the big problem.
As you can gather from it's name, the !HttpCore module is at the
very heart of this effort. It defines stuff on which all the other
modules depend and rely.
[[BR]]
!HttpCore deals with representation of HTTP messages, and
with transport logic for sending and receiving those messages.
It also defines a kind of framework infrastructure so other
modules can plug in functionality.
Unlike the old code, !HttpCore is not specific to the client side
of HTTP communication, it can also be used for the server side.
And because it is so fundamentally different from it's predecessor,
we put all the code into an all-new package hierarchy so you
don't confuse them.

 '''Q:''' I have a question.
 [[BR]]
 '''A:''' Yes, please? What would you like to ask?
 [[BR]]
 '''Q:'''
 If it is in a new package hierarchy, applications written
 for the old code will not be able to use the new one?
 [[BR]]
 '''A:'''
 Yes, that is correct. Because the old code was limited in it's design,
 we had to change the API. Applications have to be rewritten to
 make use of the new code. There was no way to avoid this.
 The all-new package names at least make sure that both old and new code
 can be used in the same environment, for example a Servlet engine,
 without interference.

Now, if you would like to follow me to the main hall...
it's called package {{{org.apache.http}}}.
You may want to keep the
[http://jakarta.apache.org/httpcomponents/httpcomponents-core/httpcore/apidocs/index.html
JavaDocs]
at hand, that will make it easier for you to follow my explanations.



== Messages ==

The first problem we had to deal with is the representation of messages.
If you don't know how to represent a message, you can't send or receive it,
right?
So here we have a set of interfaces for the building blocks of an HTTP message.
There's the {{{RequestLine}}} for a request and the
{{{StatusLine}}} for a response, both containing an {{{HttpVersion}}}.
The latter is so simple that we made it a class instead of an interface.
Then we have a {{{Header}}} with name and value, where the value
can have multiple {{{HeaderElement}}}s. And finally there is the
message body, the {{{HttpEntity}}}.

Out of these building blocks, we collect messages.
Every {{{HttpMessage}}} has headers, which can be added or deleted at will.
{{{HttpRequest}}} adds the request line,
{{{HttpEntityEnclosingRequest}}} an entity.
{{{HttpResponse}}} adds a status line and also an entity.
For convenient integration into frameworks that explore the Factory pattern,
there are factory interfaces for both requests and responses.

 '''Q:''' Aren't there responses without an entity?
 [[BR]]
 '''A:'''
 Yes. You are very attentive!
 You don't have to provide an entity to the response, you can leave it null.
 With requests, you usually know in advance whether you want to provide an
 entity or not. Also, many requests are GET requests and don't have an entity.
 That's why we created two different interfaces.
 Responses mostly have an entity, except in very special cases. Having to
 know in advance whether there will be an entity or not would have made the
 API cumbersome to use in some situations, so we went with a single interface.

 '''Q:'''
 You said headers can be added and deleted from an {{{HttpMessage}}}.
 What if I want a read-only message?
 [[BR]]
 '''A:'''
 We've given up the idea of distinguishing between modifiable and
 non-modifiable messages. It created an insane number of interfaces,
 and we had to spread instanceof checks and downcasts all over the place.
 !HttpCore after all is meant for people who know what they do.
 If you want a message to remain unchanged, simply don't change it ;-)
 If you really have to prevent modifications, you can implement the
 interface with a custom class that throws an exception whenever a
 modifying method is called.

Now, if you would come over here and have a look through this window
into the adjoining room? It's a bit too small for all of us to go into,
but you can see the important things from here.
It is called package {{{org.apache.http.message}}}.
Notice that there are basic implementations for all of the
message representation interfaces. You'll hardly need more than
those when writing an application that uses !HttpCore directly.

 '''Q:'''
 I can't see implementations for {{{GetRequest}}} and {{{PostRequest}}}
 and so on?
 [[BR]]
 '''A:'''
 You are right, we don't have those convenience classes in core.
 Core is hardcore. If you want a GET request, you just create a
 basic request and pass "GET" as the HTTP method name. Likewise for
 POST or PUT, except you'd create a basic entity enclosing request
 for those.
 You'll find convenience classes for the default HTTP methods in
 the client, but they are really superfluous in the core.

 '''Q:'''
 But then I could create an entity enclosing request with GET as the
 method name. That doesn't make sense, GET requests never have an entity!?
 [[BR]]
 '''A:'''
 Yes, you can do all sorts of stupid things with core.
 Core is hardcore, and meant for people who know what they're doing.
 And maybe you really want to create a GET request with an entity,
 for example to test how a server responds to invalid requests?
 [[BR]]
 '''Q:'''
 An interesting example. I hadn't thought about that.

All right, are there more questions about the basic implementations?
No? Good, then let's move on to the room over there.
It is called package {{{org.apache.http.entity}}}.
You can find a selection of message entities in there.
Message entities are really not that much different from the
request entities in !HttpClient 3.x, except they are no longer tied
to the client side. As in the old code, there are entities getting
their content from a string, byte array, file, or input stream.
The {{{BasicHttpEntity}}} is what we use when a message is received
over a connection. You'll see the connections later today.
We also have some advanced stuff for wrapping and buffering entities,
and an {{{EntityTemplate}}} that simplifies writing a new entity
if you have to.

 '''Q:'''
 There used to be a multipart entity in the old code?
 [[BR]]
 '''A:'''
 Indeed, there is. But that didn't make it into core.
 It was considered slightly out of scope even for the old code.
 Maybe we'll bring it into another module at some time,
 but surely not into core.

Any more questions about entities?
Fine, then let's pass through this door, back into the main hall.
We're going to have a look at connections next.



== Connections ==

Connections are needed to send HTTP message from client to server or
the other way 'round. On the interface level, we have the {{{HttpConnection}}}.
It allows for checking whether a connection is open,
for closing it or shutting it down,
and for getting statistical data if such has been gathered.
To actually send and receive messages, you have to use either
{{{HttpClientConnection}}} or {{{HttpServerConnection}}},
depending on what you implement.
Obviously, the client connection allows for sending requests and
receiving responses, whereas the server connections receives requests
and sends responses. Messages are passed to and from the connections
in terms of the interfaces we have just seen. We require two calls
for sending the message header and the message entity. That allows
for explicit handling of the expect-continue handshake, for example.


 '''Q:'''
 I don't see a method to open a connection?
 [[BR]]
 '''A:'''
 You have an eagle's eyes, don't you? That is absolutely correct,
 there is no method for opening a connection in the interface.
 It took a good deal of discussions until we got to that point.
 Opening a connection can be trickier than it might seem at first
 glance, so we left it out of the core API.

 '''Q:'''
 So how do I use a connection if I can't open it?
 [[BR]]
 '''A:'''
 Well, the API is different from the implementation. There is no
 {{{open()}}} in the API, but your implementation can offer that method.
 The default implementations we ship in core expect to be given
 an open socket, which you can create in any way you want.

 '''Q:'''
 Talking of sockets, I don't see a socket in the interfaces either.
 Wouldn't it be useful, for example to configure TCP/IP settings?
 [[BR]]
 '''A:'''
 Oh yes, it's definitely useful to know the socket. But, you see,
 it doesn't belong into the core API. Somebody might want to use the
 API with some native communication library instead of Java sockets.
 But we'll have a look at the default implementations right away,
 you'll see the socket there.

 '''Q:'''
 Does that mean one has to downcast to an implementation class in order
 to obtain the IP address and port number connected to?
 [[BR]]
 '''A:'''
 Oh no, it's not ''that'' bad. You see, here is one more interface
 {{{HttpInetConnection}}} that provides access to IP addresses and
 port numbers, both local and remote. It's an optional interface, but
 it's supported by all our default implementations. You only have to
 cast to the interface, not to an implementation class.

Before you ask any more questions, it's probably best we move on into
the next room, which is called package {{{org.apache.http.impl}}}.
As you can see, there is a whole bunch of connection implementation classes.
Don't let that confuse you, it's just for keeping the code maintainable.
All you really need to look at are the two classes
{{{DefaultHttpClientConnection}}} and {{{DefaultHttpServerConnection}}}.
You see, there are the {{{bind}}} operations I told you about,
where you pass in an open socket to have an open connection.
And inherited from a base class, there also is a {{{getSocket}}} method.

 '''Q:'''
 There are a lot of inherited methods. What's this serializer stuff?
 And the data transmitter?
 [[BR]]
 '''A:'''
 Don't worry about those. We provide reasonable defaults.
 It'll all just work by itself, you don't have to do anything.
 It's a kind of magic ;-)

 '''Q:'''
 There are connection re-use strategies here, and I've seen an interface
 in the main hall. That sounds like connection management?
 [[BR]]
 '''A:'''
 Yes, almost, but not quite. There is no connection management in core.
 But there are cases where core has to decide about closing a connection.
 Remember, {{{close()}}} is in the generic interface. The re-use strategies
 are used to decide about closing connections.

 '''Q:'''
 Can those re-use strategies query the statistical information
 you've mentioned before?
 [[BR]]
 '''A:'''
 Yes, that's the idea. A re-use strategy can look at the headers of
 request or response, but also at the statistical data of the connection
 if that is available.

 '''Q:''' Where do these doors lead to?
 [[BR]]
 '''A:'''
 Eh, please don't go there, thank you. The adjoining rooms
 {{{org.apache.http.impl.entity}}} and {{{...impl.io}}} are
 where the transport encodings are handled.
 You know, the magic stuff I mentioned.

Now please, visitors... I know that the connections look very interesting
and complicated, but you really don't want to miss the exciting things still
coming up. So, if you follow me back to the main hall, and then on to the
next room...



== Execution ==

Here we are in package {{{org.apache.http.protocol}}}.
This is the home of the framework for executing the higher levels of HTTP.
Remember that the lower levels, in particular transport encodings,
are dealt with automagically by the connections we have just left.
The protocol framework here is concerned with putting the appropriate headers
into messages, and with calling the connection methods at the right time
in the right sequence.

For example, the expect-continue handshake is dealt with here, both on the
client and server side. For those of you that are not familiar with the
details of that handshake, I'll explain it briefly.
When sending a message with a body that is large or tricky to generate,
clients don't want to risk sending the message data just to get a simple
error response from the server, for example because authentication is
required. In that case, the client will put a special Expect: header into
the request and send only the message headers. The server is expected
to check the message headers, and to respond with a status code of 100
if it finds everything ready for processing the request entity. Only
then will the client send the rest of the request. If the server detects
a problem, it responds with the appropriate error status code and the
request body is never sent.
[[BR]]
Here we have the {{{HttpRequestExecutor}}}, the client side implementation
for protocol execution. It handles the expect-continue handshake, and it
also checks whether an incoming response has an entity that needs to be read.
For the server side we have {{{HttpService}}}, which checks whether the
incoming request has an entity, and uses {{{HttpExpectationVerifier}}}
if the expect-continue handshake is employed.
Both use an {{{HttpProcessor}}} to modify and interpret headers.

The framework for setting and interpreting headers is based on '''''interceptors'''''.
Those are little classes which take care of one specific aspect, often just
a single header. These are collected into a list of interceptors that need
to be executed on a message before it is sent, or after it is received.
A range of typically needed interceptors is provided,
I'll just pick some examples.
[[BR]]
Here we have the {{{RequestUserAgent}}}. It is a request interceptor for
outgoing requests, so it is executed on requests on the client side before
they are sent. It's only task is to add a User-Agent header, if there is
none in the request. If you don't want a User-Agent header to be sent,
you just don't add this interceptor to your list.

 '''Q:'''
 How does {{{RequestUserAgent}}} know the value for the header?
 [[BR]]
 '''A:'''
 A very good question. We are keeping parameters with the request.
 Those will be the next station of this guided tour.

A trickier interceptor is {{{RequestContent}}}, also applied before a request
is sent on the client side. It checks whether there is an entity in the
request and sets up Content-Length and Transfer-Encoding headers if so.
This is a must-have interceptor if you want to send a request entity.
On the server side, {{{ResponseContent}}} does the same for the response.

 '''Q:'''
 Didn't you say that transfer encodings are handled automagically?
 [[BR]]
 '''A:'''
 Yes, I did. These interceptors are the wizards that make it all happen.


The already mentioned {{{HttpProcessor}}} holds lists of request and response
interceptors that should be applied. You set it up once when your application
initializes.

 '''Q:'''
 There are very many interceptors here. How do I know which ones I need?
 [[BR]]
 '''A:'''
 That is a tricky thing. You should stick to the interceptor lists used
 in the examples. If that doesn't do what you want, just ask by posting
 your question to the user
 [http://jakarta.apache.org/httpcomponents/mail-lists.html mailing list].


 '''Q:'''
 If I need to authenticate a request, I would use an interceptor that
 asks the user for the password?
 [[BR]]
 '''A:'''
 NO! Ahem, sorry. No. You should never execute a blocking operation of
 this kind in an interceptor, and in particular not user interactions.
 In general, you don't know what kind of background process will execute
 the interceptors. You could stall the whole application, or even others
 if it is running in a shared environment.

 '''Q:'''
 Then an interceptor that asks the user to confirm cookies is also
 not a good idea? But how should I do it?
 [[BR]]
 '''A:'''
 You should perform user interaction before or after the execution of the
 interceptors. For authentication, you would ask for the password before
 executing the request, and then give the password to the interceptor.
 For cookies, you take an interceptor that puts the cookies in a separate
 location, and ask for confirmation when the request execution is done.
 I was just about to explain how applications interact with interceptors.


You see this interface here, {{{HttpContext}}}. That is a collection of
named attributes, where names are strings and attributes can be any kind
of Java object. When a request is executed, it has one specific context.
Likewise when a request is being serviced on the server side, of course.
The interceptors, and many other parts of the framework, have access to
this context. So your application can put some data - like a password -
into the context, and an interceptor picks it up. On the other hand, an
interceptor can put data into the context - like incoming cookies - and your
application picks that up after the execution.
[[BR]]
The context is also the place to keep session information, like the cookies
that should be sent or passwords that have already been entered.
Mind you, core does not handle cookies or authentication.
Core is hardcore, it just provides the framework for doing that.
The examples show what attributes need to be present in the context for
the default interceptors to work. We have synchronized and unsynchronized
implementations of the {{{HttpContext}}} interface.

Now, if you would kindly follow me to the last stop on our little tour...


== Parameters ==

This is package {{{org.apache.http.params}}}, home of the parameter framework.
We've introduced the preferences framework with version 3.0 of the old code.
The 4.0 version is a natural evolution of that rather than a radical redesign.
We keep maps of named parameters in instances of {{{HttpParams}}}.
Parameters get attached to HTTP messages, so they are available to all
objects involved in processing a message: interceptors, connections,
and whatever else other modules are going to add on top of core.
[[BR]]
In the old code, parameters were hierarchical. This feature is still present,
we can link a set of parameters with another one providing defaults.
However, this feature should ''never'' be used by applications directly.
Parameters may and will be linked inside the framework, and having both
application and framework set up parameter hierarchies would wreak havoc
on both.

 '''Q:'''
 What's the difference between parameters and contexts?
 Both are maps of named attributes.
 [[BR]]
 '''A:'''
 From a framework perspective, parameters are read-only. The application
 prepares parameters in advance, then the frameworks reads them.
 The context is updated by the framework.
 Furthermore, parameters are meant to hold data, whereas the context can
 hold any kind of attribute. As a rule of thumb, if it is something you'd
 write into a properties file, that's a candidate for parameters. If you
 need to set up a callback at runtime, that does into the context.

Caution has to be used when updating parameters after they have been
passed to the framework. You should avoid to update a parameter set at
all while execution or servicing is in progress.
The default implementation of {{{HttpParams}}} is unsynchronized,
because the framework will use it read-only.
[[BR]]
The parameter ''values'' themselves should be read-only at all times.
So if for example you stored a modifiable map as a parameter value,
never modify that map again. If you have to update the parameter set
with a new map, then copy the old one, modify the copy, and replace
the old value with the modified copy.

 '''Q:'''
 What's with these {{{HttpProtocolParams}}} and {{{HttpConnectionParams}}}?
 Are these special implementations of the interface?
 [[BR]]
 '''A:'''
 Ah, no. We define constants for the various parameters we are using.
 Those classes contain the constants, and static helper methods for
 getting and setting the respective parameters.
 This encapsulates typecasts and provides at least some type safety.

 '''Q:'''
 Is there a helper that loads parameters from a properties file?
 [[BR]]
 '''A:'''
 No, core is hardcore. We don't deal with configuration via properties files
 in core. If we supported properties files today, we'd be asked to support
 XML configuration tomorrow, and Spring instantiation the day after.
 There would be no end to it.
 Besides, instantiating parameters with the correct type is not trivial.
 A string parameter needs to be stored as {{{String}}}, whereas an integer
 parameter needs to be stored as {{{Integer}}}. It becomes even worse for
 custom parameter types. We don't want this kind of type conversion logic
 in core. Maybe in an extra module, sometime.


== Farewell ==

I hope you enjoyed our tour of the
[wiki:Self:HttpComponents HttpCore module]
and found the experience enlightening.
If you have any more questions, do not hesitate to post them on the
user [http://jakarta.apache.org/httpcomponents/mail-lists.html mailing list].
Saying that, you might want to search the archives of the mailing lists
first, in case somebody else already got an answer to a similar question.
We are also considering to offer guided tours of other modules in the future.
We'd be happy if you join one of those when they become available.

Thank you all, and see you next time!

---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


Mime
View raw message