www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rainer Jung (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (INFRA-6714) ooo-wiki2-vm needs more cpu resources.
Date Sat, 07 Sep 2013 12:09:51 GMT

    [ https://issues.apache.org/jira/browse/INFRA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761009#comment-13761009
] 

Rainer Jung commented on INFRA-6714:
------------------------------------

It seems the situation is better since Sep. 3rd, around 1.a.m. system time.

Some change applied then (?) in combination with restrictions for spiders seems to have solved
the problem. I'm judging on the base of error and access logs of the last 31 days.

Note that during that time there already have been phases of up to 9 days without issues,
so we can't yet be sure the performance problems are really fixed.

Here's a little table with data. The A, B and C columns are errors that show the FCGI connections
getting disabled and 503s served as a consequence. The last column is the number of FCGI request
timeouts pr day (30 seconds timeout). As we can see September 1st and 2nd were especially
bad, some other days also bad and the last 3-4 days are good.

A: Can not connect to PHP FPM process (typically: all processes busy plus backlog full)
B: Tried to use a disabled worker, but still couldn't get a connection
C: Disabling worker (not using PHP FPM for the next 60 seconds since we could not create a
new connection to PHP FPM)

Date    A       B       C               Requests        WikiRequests    RequestTimeouts30Seconds
2013-08-02      28      1       3       1420687 321501  ?
2013-08-03      56      33      6       1109756 249739  ?
2013-08-04      1438    307     118     1171222 252887  ?
2013-08-05      32      26      6       1505184 310939  ?
2013-08-06      782     177     89      1534151 366735  1015
2013-08-07      847     232     86      1474892 313400  1595
2013-08-08      0       0       0       1377637 261653  119
2013-08-09      0       0       0       1239353 252956  2
2013-08-10      824     177     67      997111  203882  1697
2013-08-11      0       0       0       1234279 197442  1
2013-08-12      712     171     74      1881730 289023  873
2013-08-13      0       0       0       1612885 343740  16
2013-08-15      634     135     75      1448155 243341  1088
2013-08-16      0       0       0       1384099 296018  20
2013-08-17      0       0       0       1103834 206260  39
2013-08-18      0       0       0       1237445 202881  7
2013-08-19      0       0       0       1591813 276412  4
2013-08-20      0       0       0       1520808 270822  7
2013-08-21      0       0       0       1481281 277610  7
2013-08-22      0       0       0       1395248 256100  4
2013-08-23      0       0       0       1285627 256283  6
2013-08-24      0       0       0       1076761 235278  5
2013-08-25      817     117     53      1215800 236018  1089
2013-08-26      0       0       0       1556519 288562  270
2013-08-27      2355    520     235     1590916 322275  2230
2013-08-28      65      23      19      1585014 304717  323
2013-08-29      380     161     50      1497661 304836  878
2013-08-30      1831    312     128     1384382 312891  1544
2013-08-31      0       0       0       1144128 264760  79
2013-09-01      3794    1706    727     1188186 235837  20882
2013-09-02      7791    3204    1250    1441746 265329  34670
2013-09-03      0       0       0       1536807 247950  870
2013-09-04      0       0       0       1507768 226262  6
2013-09-05      0       0       0       1438105 211304  4
2013-09-06      0       0       0       1279063 216433  7

I suggest to not add the CPUs right now but to wait whether the situation stays OK like now
or the problem comes back. I don't know whether CPU resourcs would be available if needed
though.

                
> ooo-wiki2-vm needs more cpu resources.
> --------------------------------------
>
>                 Key: INFRA-6714
>                 URL: https://issues.apache.org/jira/browse/INFRA-6714
>             Project: Infrastructure
>          Issue Type: Improvement
>          Components: Website
>         Environment: ooo-wiki2-vm, httpd
>            Reporter: jan iversen
>            Priority: Critical
>
> The aoo wiki is used by a lot more users due to our release 4.0, this has lead to massive
timeouts.
> According to rjung spiders might be part of problem which we try to block, but at the
same time spiders (especially google) is important.
> It will not help to allow more fpm, since the cpu is the blocker.
> So I request a bit more CPU cores. I have asked rjung for help, since he knows a lot
more than I and his analysis is also CPU is limiting.
> copy of IRC:
> <rjung> janIV: I had a short look at the aoo wiki yesterday. It seems to me it
doesn't have enough CPU resources or phrased differently it uses to much CPU resources. So
when the load gets up the VM can't cope with it and queues lots of request until Apache takes
PHP out of the proxy.
> <janIV> rjung: thx for your analysis, just to be sure, to start more fpm wont solve
anything ?
> <janIV> rjung: we have had the same problem 4 times today.
> <rjung> janIV: I think "no". But that's only based on what I saw yesterday.
> <janIV> rjung: mind if I refer to these lines in an infra requesting more cores
?
> <rjung> janIV: Maybe there's a way to block spidetrs from expensive requests.
> <rjung> janIV: I don't mind, but please both parts, also the "phrased differently"
part.
> <janIV> rjung: I tried to look at blocking, but to be honest my knowledge it not
deep enough to make a block depending on cpu load.
> <janIV> rjung: I will only be fair, in the jira, and secure you are asked, since
you are much more specialist on this than I am.
> <rjung> janIV: no that would be too complex. Just blocking things like GET /w/index.php?title=Special:RecentChanges&feed=atom
HTTP/1.1 for UA containing bot.
> <janIV> rjung: I thought you would say that, but e.g. the google spider is important
to AOO (search results), but clearly not when wiki is overloaded.
> <rjung> janIV: You can have a look the the access log. The last columns is the
response time in microseconds. Se a "awk '$NF>=15000000' FILENAME" shows all requests taking
longer than 15 seconds etc.
> <rjung> But how important is spidering "GET /w/index.php?title=Special:RecentChanges&feed=atom
HTTP/1.1"?
> <janIV> rjung: NOT important, but can we block a spider on the url it request or
only total ?
> <rjung> For instance today there's 1676 times a request for "GET /w/index.php?title=Special:RecentChanges&feed=rss".
> <janIV> rjung: UPs, too fast, that url, gives the spider what happened in the last
48hours, it has an option of the timestamp which they dont use.
> <rjung> Of those 1676 none succeded, because all took longer than the timeout of
30 seconds. I guess they still kept your wiki pretty busy without delivering anything back.
> <rjung> Maybe you can optimize the handling of such requests?
> <janIV> rjung: can we make a fast patch, and block spiders, while looking for a
long-term solution ?
> <rjung> We can but you would need to provide the URIs which should be blocked.
> <janIV> my problem is that I dont know what is spiders and whats real users, can
we see who request the same url say more then 50 times a day ?
> Not automatically. But I suggest we simply start by assuming anything is a spider that
matches /bot/i in the user agent string.
> <janIV> can you please make a block for that, then we can later look at enhancing
it.
> Its important to the AOO project, that our users have access to wiki and feel we support
them. There are others out there, using something like this to announce AOO is practically
dead, which of course if far from true.
> I hope for a speed solution to the problem, if nothing else, then just to show that AOO
is an important project.
> thanks in advance
> jan I.
> Ps. for reference, at Oracle the wiki alone had 4 cores on a dedicated server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message