www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jan iversen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (INFRA-6714) ooo-wiki2-vm needs more cpu resources.
Date Sat, 07 Sep 2013 12:28:52 GMT

    [ https://issues.apache.org/jira/browse/INFRA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761014#comment-13761014

jan iversen commented on INFRA-6714:

Thanks for your analysis, and I agree with your conclusion.

Based on advice in #asfinfra, I added caching for pages and for images, that seems to have
"activated" ATS, who in fact did not cache before.

fyi: I have asked to have my access/sudo right to this vm removed, for other reasons.

jan I.
> ooo-wiki2-vm needs more cpu resources.
> --------------------------------------
>                 Key: INFRA-6714
>                 URL: https://issues.apache.org/jira/browse/INFRA-6714
>             Project: Infrastructure
>          Issue Type: Improvement
>          Components: Website
>         Environment: ooo-wiki2-vm, httpd
>            Reporter: jan iversen
>            Priority: Critical
> The aoo wiki is used by a lot more users due to our release 4.0, this has lead to massive
> According to rjung spiders might be part of problem which we try to block, but at the
same time spiders (especially google) is important.
> It will not help to allow more fpm, since the cpu is the blocker.
> So I request a bit more CPU cores. I have asked rjung for help, since he knows a lot
more than I and his analysis is also CPU is limiting.
> copy of IRC:
> <rjung> janIV: I had a short look at the aoo wiki yesterday. It seems to me it
doesn't have enough CPU resources or phrased differently it uses to much CPU resources. So
when the load gets up the VM can't cope with it and queues lots of request until Apache takes
PHP out of the proxy.
> <janIV> rjung: thx for your analysis, just to be sure, to start more fpm wont solve
anything ?
> <janIV> rjung: we have had the same problem 4 times today.
> <rjung> janIV: I think "no". But that's only based on what I saw yesterday.
> <janIV> rjung: mind if I refer to these lines in an infra requesting more cores
> <rjung> janIV: Maybe there's a way to block spidetrs from expensive requests.
> <rjung> janIV: I don't mind, but please both parts, also the "phrased differently"
> <janIV> rjung: I tried to look at blocking, but to be honest my knowledge it not
deep enough to make a block depending on cpu load.
> <janIV> rjung: I will only be fair, in the jira, and secure you are asked, since
you are much more specialist on this than I am.
> <rjung> janIV: no that would be too complex. Just blocking things like GET /w/index.php?title=Special:RecentChanges&feed=atom
HTTP/1.1 for UA containing bot.
> <janIV> rjung: I thought you would say that, but e.g. the google spider is important
to AOO (search results), but clearly not when wiki is overloaded.
> <rjung> janIV: You can have a look the the access log. The last columns is the
response time in microseconds. Se a "awk '$NF>=15000000' FILENAME" shows all requests taking
longer than 15 seconds etc.
> <rjung> But how important is spidering "GET /w/index.php?title=Special:RecentChanges&feed=atom
> <janIV> rjung: NOT important, but can we block a spider on the url it request or
only total ?
> <rjung> For instance today there's 1676 times a request for "GET /w/index.php?title=Special:RecentChanges&feed=rss".
> <janIV> rjung: UPs, too fast, that url, gives the spider what happened in the last
48hours, it has an option of the timestamp which they dont use.
> <rjung> Of those 1676 none succeded, because all took longer than the timeout of
30 seconds. I guess they still kept your wiki pretty busy without delivering anything back.
> <rjung> Maybe you can optimize the handling of such requests?
> <janIV> rjung: can we make a fast patch, and block spiders, while looking for a
long-term solution ?
> <rjung> We can but you would need to provide the URIs which should be blocked.
> <janIV> my problem is that I dont know what is spiders and whats real users, can
we see who request the same url say more then 50 times a day ?
> Not automatically. But I suggest we simply start by assuming anything is a spider that
matches /bot/i in the user agent string.
> <janIV> can you please make a block for that, then we can later look at enhancing
> Its important to the AOO project, that our users have access to wiki and feel we support
them. There are others out there, using something like this to announce AOO is practically
dead, which of course if far from true.
> I hope for a speed solution to the problem, if nothing else, then just to show that AOO
is an important project.
> thanks in advance
> jan I.
> Ps. for reference, at Oracle the wiki alone had 4 cores on a dedicated server.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message