activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin MacNaughton" <colin.macnaugh...@gmail.com>
Subject RE: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management
Date Thu, 11 Jun 2009 16:06:33 GMT
Hi Rob, 

In terms of configurable maximums with respect to destinations:
Maximum memory allocation per PTP queue is supported, but for topics the
limits are actually tied to the Subscriptions receiving the message. This is
because these are the objects that actually map to the underlying cursored
queues that hold the messages and do the paging/limiting. Does that make
sense?

In terms of overall disk/memory limits. The approach we're taking would not
explicitly define a single overall limit (at least not initially) -- rather
the total maximum is based on the resources you create. E.g. as a user you
need to know how many queues and subscriptions you create and plan for
memory/disk accordingly. This doesn't preclude trying to enforce global
limits later, but in my opinion doing so complicates the implementation a
fair amount, in terms of trying to intelligently balancing the available
space across the queues and also leads to additional contention on the
shared limiter -- and worse can lead to resource related deadlocks if we get
it wrong. We can still do things like limit the maximum number/size of
subscriptions/queues/connections etc.

Colin 

-----Original Message-----
From: Rob Davies [mailto:rajdavies@gmail.com] 
Sent: Thursday, June 11, 2009 1:12 AM
To: dev@activemq.apache.org
Subject: Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory
Management

Hi Colin,

In 5.x flow control behaves as if its binary - off or on. When its off  
- messages can be offlined (for non-persistent messages this means  
being dumped to temporary storage) - but when its on - the producers  
slow and stop.
Also - there can be cases when you get a temporary slow consumer (the  
consuming app may be doing a big gc) - which means with flow control  
off - messages get dumped to disk - and then the producers may never  
slow down enough again for the consumer to catch up. Flow control is  
difficult to implement for all cases - but we should allow for  
configuration of the following:

* maximum overall broker memory
* maximum memory allocation per destination
* maximum storage allocation
* maximum storage allocation per destination
* maximum temporary storage allocation
* maximum temporary storage allocation per destination

when we start to hit a resource limit - we should aggressively gc  
messages that have expired, then either offline (an flow control when  
that limit is hit) or flow control.
It would be great to have a combined policy where we can block a  
producer for a short time (seconds) then offline
For non-persistent messages - we still need a policy where we can  
remove messages based on a selector (which would be in addition to  
expiring messages).

cheers,

Rob

On 10 Jun 2009, at 17:29, Colin MacNaughton wrote:

> Hi Everyone,
>
> As a follow on to my e-mail last week introducing the core broker
> prototype that Hiram and I have been working on, I wanted to spin up a
> thread on the flow control model that we're using.
>
> I'd be interested to hear in your thoughts on current shortcomings
> associated with flow control / memory management in 5.3 so we can make
> sure that the use cases are covered. Beyond that any additional  
> input on
> the design or implementation would be great ... are we on the right
> track?
>
> Cheers,
> Colin
>
>
> The text below is taken straight from the webgen in the project, my
> apologies if it's a little verbose!
>
> As a reminder the bits can be found at:
> https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow
>
> The activemq-flow package is meant to be a standalone module that  
> deals
> generically with Resources and Flow's of elements that flow through  
> and
> between them. The current implementation is designed with the  
> following
> goals in mind:
>
>    * SIMPLE: Want a fairly simple and consistent model for controlling
> flow of messages and other data in the system to control memory and  
> disk
> space. The module must be able to handle fan-in/fan-out as well as
> simpler 1 to 1 cases.
>    * PERFORMANT: The flow control mechanism must be performant and
> should not introduce much overhead in cases where downstream resources
> are able to keep up.
>    * MODULARIZED: The module should be independent generic and
> reusable.
>    * FAIRNESS: We should be able to provide better fairness. If I've
> got several producers putting messages on a queue, the flow controller
> should not prefer one source over the other (unless configured to do  
> so)
>    * VISIBILITY: With a unified model in place we can instrument it to
> provide visibility in the product (e.g. a visual graph of flows in the
> system). When a customer says that they are not using PERSISTENT
> messages yet we see 1000msgs/sec flowing through the recovery log....
>    * ADMINISTRATION: We can explore the possibility of  
> administratively
> limiting message flows. E.g. I've done my production stress testing  
> and
> can successfully handle my anticipated load of 4000 msgs/sec on topic1
> ... I'd prefer to avoid the case where publishers go berserk and
> overload my backend with messages).
>    * POLICIES: We should be able to better instrument general flow
> control policies. E.g. I want to tune for latency or throughput. If a
> subscriber gets behind, I'd like the policy for messages on topic1  
> to be
> that I drop the oldest messages instead of initiating flow control.
>
> The Basics:
>
> Each resource creates a FlowController for each of it's Flows which is
> assigned a corresponding FlowLimiter. As elements (e.g. messages) pass
> from one resource to another they are passed through the downstream
> resource's FlowController which updates its Limiter. If propagation of
> an element from one resource to another causes the downstream  
> limiter to
> become throttled the associated FlowController will block the source  
> of
> the element. The flow module is used heavily by the rest of the core  
> for
> memory and disk management.
>
>    * Memory Management: Memory is managed based on the resources in
> play -- the usage is computed by summing of the space allocated to  
> each
> of the resources' limiters. This strategy intentionally avoids a
> centralized memory limit which leads to complicated logic to track  
> when
> a centralized limiter needs to be decremented and avoids contention
> between multiple resources/threads accessing the limiter and also
> reduces the potential for memory limiter related deadlocks. However,  
> it
> should be noted that this approach doesn't preclude implementing
> centralized limiters in the future.
>    * Flow Control: As messages propagate from one resource A to  
> another
> B, then if A overflows B's limit, B will block A and A can't release
> it's limiter space until B unblocks it. This allowance for overflow  
> into
> downstream resources is a key concept in flow control performance and
> ease of use. Provided that the upstream resource has already accounted
> for the message's memory it can freely overflow any downstream limiter
> providing it reserves space from elements that caused overflow.
>    * Threading Model: Note that as a message propagates from A to B,
> that the general contract is that A won't release it's memory if B
> blocks it during the course of dispatch. This means that it is not  
> safe
> to perform a thread handoff during dispatch between two resources  
> since
> the thread dispatching A relies on the message making it to B (so  
> that B
> can block it) prior to A completing dispatch.
>    * Management/Visibility: Another intended use of the activemq-flow
> module is to assist in visibility e.g. provide an underlying map of
> resources that can be exposed via tooling to see the relationships
> between sources and sinks of messages and to find bottlenecks ... this
> aspect has been downplayed for now as we have been focusing more on  
> the
> queueing/memory management model in the prototype, but eventually the
> flow package itself will provide a handy way of providing visibility  
> in
> the system particularly in terms of finding performance bottlenecks.
>
> FlowResource (FlowSink and FlowSource): A container for  
> FlowControllers
> providing some lifecycle related logic. The base resource class  
> handles
> interaction/registration with the FlowManager (below).
>
> FlowManager: Registry for Flow's and FlowResources. The manager will
> provide some hooks into system visibility. As mentioned above this
> aspect has been downplayed somewhat for the present time.
>
> FlowController: Wraps a FlowLimiter and actually implements common  
> basic
> block/resume logic between FlowControllers.
>
> FlowLimiter: Defines the limits enforced by a FlowController.  
> Currently
> the package has size based limiter implementations, but eventually
> should also support other common limiter types such as rate based
> limiters. The limiter's are also extended at other points in the  
> broker
> (for example implementing a protocol based WindowLimiter). It is also
> likely that we would want to introduce CompositeLimiters to combine
> various limiter types.
>
> Flow: The concept of a flow is not used very heavily right now. But a
> Flow defines the stream of elements that can be blocked. In general  
> the
> prototype creates a single flow per resource, but in the future a  
> source
> may break it's elements down into more granular flows on which
> downstream sinks may block it. One case where this is anticipated as
> being useful is in networks of brokers where-in it may be desirable to
> partition messages into more granular flows (e.g based on producer or
> destination) to avoid blocking the broker-broker connection
> uncessarily).
>
>



Mime
View raw message