tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: Buffer overflow in jk_connect.c::nb_connect
Date Tue, 19 Jul 2016 13:11:41 GMT
Am 19.07.2016 um 00:20 schrieb Christopher Schultz:
> On 7/18/16 5:48 PM, Rainer Jung wrote:
>> Am 18.07.2016 um 17:02 schrieb Christopher Schultz:
>>> All,
>>>
>>> Michael Deiner found a buffer overflow in the call to FD_SET macro on
>>> line 291 of jk_connect.c:
>>>
>>> 280>   do {
>>> 281>        rc = connect(sd, (const struct sockaddr *)&addr->sa.sin,
>>> addr->salen);
>>> 282>    } while (rc == -1 && errno == EINTR);
>>> 283>
>>> 284>    if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY)
>>> 285>                   && (timeout > 0)) {
>>> 286>        fd_set wfdset;
>>> 287>        struct timeval tv;
>>> 288>        socklen_t rclen = (socklen_t)sizeof(rc);
>>> 289>
>>> 290>        FD_ZERO(&wfdset);
>>> *291>        FD_SET(sd, &wfdset);*
>>> 292>        tv.tv_sec = timeout / 1000;
>>> 293>        tv.tv_usec = (timeout % 1000) * 1000;
>>> 294>        rc = select(sd + 1, NULL, &wfdset, NULL, &tv);
>>>
>>> I'd like to fix this so it won't bring-down the server :)
>>>
>>> But it quickly gets complicated.
>>>
>>> The method itself takes "sd" (a jk_sock_t) as an argument to the
>>> function, and we can check immediately whether it will cause FD_SET to
>>> overflow -- easy: just check to see if the value is too large -- but
>>> what should we do in that case?
>>>
>>> This function should be connecting to a back-end Tomcat server, but if
>>> we have too many outbound connections, we'll fail.
>>>
>>> I'm not sure it makes any sense to let things get this far.
>>>
>>> The proposed solution[1] is to use poll() instead of select(), but that
>>> won't work on every platform, plus I'd like to be able to fix the buffer
>>> overflow right away while we work-out a solution for poll() that will
>>> work in most environments.
>>>
>>> I think if the connection_pool_size exceeds FD_SETSIZE we should refuse
>>> to start. Any other behavior will eventually cause errors.
>>
>> +1 in principal. Unfortunately on Windows it seems the default for
>> FD_SETSIZE is only 64. That's probably too small but it seems it is
>> allowed on Windows to increase this limit during compilation:
>>
>> <Quote>
>> The variable FD_SETSIZE determines the maximum number of descriptors in
>> a set. (The default value of FD_SETSIZE is 64, which can be modified by
>> defining FD_SETSIZE to another value before including Winsock2.h.)
>> Internally, socket handles in an fd_set structure are not represented as
>> bit flags as in Berkeley Unix. Their data representation is opaque.
>> </Quote>
>
> That's ... weird. Okay.
>
>> So we should IMHO aim for
>>
>> a) check connection pool size against FD_SETSIZE and fail during startup
>> if too big - or we decrease it to the max value and log a warning?
>
> On *NIX, that value cannot reasonably be changed. I think we need to
> make all our decisions at compile-time and fail-fast at runtime.

Yes, with "decrease it" I meant decreasing the configured pool size, 
just as you assumed below.

> Lowering to a reasonable maximum value is probably okay. I'm not sure
> which would be worse: requiring the administrator to fix a configuration
> problem before the server can even start (imagine a server that's been
> working for years without this config, now it requires some change) or
> auto-reconfiguring based upon a value the admin hasn't set.
>
> Actually... in cases where this would have affected users, the result
> would have been that everything is fine until there is a buffer
> overflow. Hopefully, the buffer overflow is fatal, but it might not be.
>
> So, lowering to a smaller value if connection_pool_size is too big
> sounds good to me. Log with severity=WARN is a good option for notification.
>
>> b) define 1024 as the compile time FD_SETSIZE on Windows (same value as
>> the default e.g. on Linux and on 32 Bit Solaris). We already use 250 as
>> the default connection pool size.
>
> +1
>
>> c) allow to increase FD_SETSIZE when building on Windows because it is
>> supported there.
>
> +1
>
> We probably want something like JK_FD_SETSIZE defaults to 1024 and then
> FD_SETSIZE = JK_FD_SETSIZE in the build. I have absolutely no idea how
> on earth to do that for our Windows builds.
>
>> d) use the existing macro HAVE_POLL to offer a poll based code path if
>> poll was detected.
>
> I don't think HAVE_POLL is any kind of standard. I poked-around my
> Debian Linux environment and HAVE_POLL was defined in a number of header
> files, but it was unconditionally-defined to be "1" in files such as
> postgresql/pg_config.h, so I think the package-maintainers must have
> just said "this system has poll.h, let's just set HAVE_POLL=1 and call
> it a day".

It is a define that our mod_jk build system sets when configure is used 
and detects during the configure run, that poll() is available. This 
define is already used in common/jk_connect.c at some places and could 
be used in nb_connect() as well. It only makes sense for platforms where 
configure is being run, ie. not on Windows and Netware, but nb_connect() 
already has different implementation lines for the three platform types 
(Windows, Netware, *Nix).

Using poll() if HAVE_POLL is defined (ie. poll() is available), gives us 
a clean solution on most platforms except Windows, and if we allow to 
increase the FD_SETSIZE on Windows (and choose a sane default ourselves) 
people can work around the problem there as well.

I'll have a look at the Windows build files concerning setting 
FD_SETSIZE to 1024 and probably allow to change the value during 
compilation.

>> Concerning a) the code that reads the pool size is in
>> common/jk_ajp_common.c in function ajp_init():
>>
>> p->ep_cache_sz = jk_get_worker_cache_size(props, p->name, cache);
>>
>> The function gets a logger as an argument, so if we want we can easily
>> log stuff there. Correcting a cacxhe size that's too big to the max
>> value and log would be easiest. Terminating the startup is more
>> difficult. We do it e.g. for Apache using jk_error_exit() but it will be
>> a bit tedious to propagate the error from jk_ajp_common.c to mod_jk.c.
>> Furthermore you need a similar solution for ISAPI. Therefore I suggest
>> to choose the "correct the config and warn" attempt.
>>
>>> Thoughts?
>
> I like correct-and-warn for a number of reasons.
>
> So forgetting poll() for the time being, the correct-and-warn would just
> be a check for connection_pool_size > FD_SETSIZE, then set
> connection_pool_size = FD_SETSIZE and warn, right?

Correct, BUT: I'm not totally convinced, that a limit on the pool size 
suffices. I think file descriptor numbers are global to each process. So 
if you have various workers, say 20 workers each with a pool size of 
100, then you could easily get file descriptor numbers from 0 to 1999. 
Furthermore I expect other (non-mod_jk resp. non-ISAPI) file descriptors 
in the process count as well:

- open log files
- incoming client to web server connections
- more?

If that is true, the attempt to work around the problem by limiting 
pools wouldn't work. So Maybe we need to change the strategy to

- introduce poll() when available
- make FD_SETSIZE configurable (plus sane default) on Windows
- fail during connection attempt if file descriptor number for new 
socket is to big

WDYT?

> Seems simple enough :)

Hmmmm ...

Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message