brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aled Sage (JIRA)" <>
Subject [jira] [Commented] (BROOKLYN-394) "Request limit exceeded" on Amazon
Date Wed, 23 Nov 2016 12:48:58 GMT


Aled Sage commented on BROOKLYN-394:

[~alex.heneveld] (cc [~andreaturli]) I've raised,
but that is just about changing the rate-limit default retry/backoff times.

I'd reword from "improved it a bit" to "improved it a huge amount". Though I confess I've
not repeated the experiments! Previously, if you tried provisioning 20 machines concurrently,
you'd maybe get approx 25% rate-limited. Most/several of those would fail because we were
retrying very quickly. Now that we backoff for longer, those are likely to all succeed.

If we repeated this for 200 VMs, then I agree we'd likely still have serious problems with
some of the VMs failing.

If we were to somehow have all provisioning threads in that jclouds {{ComputeService}} collaborating
on their back-off, then I agree we'd improve things further.

But we'd still hit problems if there were multiple Brooklyn instances trying to provision
a lot of VMs in the same AWS account.

I'd argue that the current exponential backoff is a fine compromise between simplicity and
functionality, as long as we back off for long enough. If we're confident that the "request
limit exceeded" definitely means rate-limiting, then arguably we should keep trying for a
very long time! We should probably back-off to a lot less often than every 5 seconds, and
we should keep trying for several minutes.

> "Request limit exceeded" on Amazon
> ----------------------------------
>                 Key: BROOKLYN-394
>                 URL:
>             Project: Brooklyn
>          Issue Type: Bug
>            Reporter: Svetoslav Neykov
>            Assignee: Aled Sage
>             Fix For: 0.10.0
> Any moderately sized blueprint could trigger {{Request limit exceeded}} on Amazon (say
kubernetes). The only way users have control over the request rate is by setting {{maxConcurrentMachineCreations}}
with the current recommended value of 3 (see
> It's bad user experience if one needs to adapt the location based on the blueprint.
> Possible steps to improve:
> * Add to troubleshooting documentation
> * Make maxConcurrentMachineCreations default to 3
> * Check are we polling for machine creation too often.
> * Check how many requests are we hitting Amazon with (per created machine)
> * The number of requests per machine could vary from blueprint to blueprint (say if the
blueprint is creating security networks, using other amazon services). Is there a way to throttle
our requests to amazon and stay below a certain limit per second?
> * I've hit the error during machine tear down as well, so {{maxConcurrentMachineCreations}}
is not enough to work around
> Some docs on rate limits at
> Related:

This message was sent by Atlassian JIRA

View raw message