Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
Subject: Re: buckets and connections (long post)
From: Graham Leggett <minfrin@sharp.fm>
In-Reply-To: <4D37F1D6-E2FB-408B-9D82-9F7ABABFFE1E@greenbytes.de>
Date: Wed, 21 Oct 2015 16:48:54 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <8C00B305-BECC-4A48-9A03-8A7CA99EADE4@sharp.fm>
References: 
 <CAKQ1sVNchFx=egp=Q0tgT+1GcBXiten8mh9STetcZXOfJxhsjQ@mail.gmail.com>
 <CAKQ1sVPjcA9EYNpjcmM9=AU_LCgAsFntOcZnmGoHymds3S5wmA@mail.gmail.com>
 <AB1691BE05AE7F4992697F2A0835627AC6348972@VOEXM10W.internal.vodafone.com>
 <7B24F4D1-6904-4C00-A337-88F712026BE9@greenbytes.de>
 <0D5B13DA-8642-4662-9606-6B5DF94EB87B@sharp.fm>
 <D70FBC43-2E52-4BDD-ADA5-88623657CBEC@jaguNET.com>
 <4D37F1D6-E2FB-408B-9D82-9F7ABABFFE1E@greenbytes.de>
To: dev@httpd.apache.org

On 21 Oct 2015, at 4:18 PM, Stefan Eissing =
<stefan.eissing@greenbytes.de> wrote:

> How good does this mechanism work for mod_http2? On the one side it's =
the same, on the other quite different.
>=20
> On the real, main connection, the master connection, where the h2 =
session resides, things are
> pretty similar with some exceptions:
> - it is very bursty. requests continue to come in. There is no pause =
between responses and the next request.
> - pauses, when they happen, will be longer. clients are expected to =
keep open connections around for
>  longer (if we let them).
> - When there is nothing to do, mod_http2 makes a blocking read on the =
connection input. This currently
>  does not lead to the state B) or C). The worker for the http2 =
connection stays assigned. This needs
>  to improve.

The blocking read breaks the spirit of the event MPM.

In theory, as long as you enter the write completion state and then not =
leave until your connection is done, this problem will go away.

If you want to read instead of write, make sure the CONN_SENSE_WANT_READ =
option is set on the connection.

(You may find reasons that stop this working, if so, these need to be =
isolated and fixed).

> This is the way it is implemented now. There may be other ways, but =
this is the way we have. If we
> continue along this path, we have the following obstacles to overcome:
> 1. the master connection probably can play nicer with the MPM so that =
an idle connection uses less
>   resources
> 2. The transfer of buckets from the slave to the master connection is =
a COPY except in case of
>   file buckets (and there is a limit on that as well to not run out of =
handles).
>   All other attempts at avoiding the copy, failed. This may be a =
personal limitation of my APRbilities.

This is how the proxy does it.

Buckets owned by the backend conn_rec are copied and added to the =
frontend conn_rec.

> 3. The amount of buffered bytes should be more flexible per stream and =
redistribute a maximum for=20
>   the whole session depending on load.
> 4. mod_http2 needs a process wide Resource Allocator for file handles. =
A master connection should
>   borrow n handles at start, increase/decrease the amount based on =
load, to give best performance
> 5. similar optimizations should be possible for other bucket types =
(mmap? immortal? heap?)

Right now this task is handled by the core network filter - it is very =
likely this problem is already solved, and you don=E2=80=99t need to do =
anything.

If the problem still needs solving, then the core filter is the place to =
do it. What the core filter does is add up the resources taken up by =
different buckets and if these resources breach limits, blocking writes =
are done until we=E2=80=99re below the limit again. This provides the =
flow control we need.

With the async filters this flow control is now made available to every =
filter in the ap_filter_setaside_brigade() function. When mod_http2 =
handles async filters you=E2=80=99ll get this flow control for free.

> 6. pool buckets are very tricky to optimize, as pool creation/destroy =
is not thread-safe in general
>   and it depends on how the parent pools and their allocators are set =
up.=20
>   Early hopes get easily crushed under load.

As soon as I see =E2=80=9Cbuckets aren=E2=80=99t thread safe=E2=80=9D I =
read it as =E2=80=9Cbuckets are being misused=E2=80=9D or =E2=80=9Cpool =
lifetimes are being mixed up".

Buckets arise from allocators, and you must never try add a bucket from =
one allocator into a brigade sourced from another allocator. For =
example, if you have a bucket allocated from the slave connection, you =
need to copy it into a different bucket allocated from the master =
connection before trying to add it to a master brigade.

Buckets are also allocated from pools, and pools have different =
lifetimes depending on what they were created for. If you allocate a =
bucket from the request pool, that bucket will vanish when the request =
pool is destroyed. Buckets can be passed from one pool to another, that =
is what =E2=80=9Csetaside=E2=80=9D means.

It is really important to get the pool lifetimes right. Allocate =
something accidentally from the master connection pool on a slave =
connection and it appears to work, because generally the master outlives =
the slave. Until the master is cleaned up first, and suddenly memory =
vanishes unexpectedly in the slave connections - and you crash.

There were a number of subtle bugs in the proxy where buckets had been =
allocated from the wrong pool, and all sorts of weirdness ensued. Make =
sure your pool lifetimes are allocated correctly and it will work.

> 7. The buckets passed down on the master connection are using another =
buffer - when on https:// -
>   to influence the SSL record sizes on write. Another COPY is not =
nice, but write performance
>   is better this way. The ssl optimizations in place do not work for =
HTTP/2 as it has other
>   bucket patterns. We should look if we can combine this into =
something without COPY, but with
>   good sized SSL writes.

mod_ssl already worries about buffering on it=E2=80=99s own, there is no =
need to recreate this functionality. Was this not working?

Regards,
Graham
=E2=80=94