geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kirk Lund (JIRA)" <>
Subject [jira] [Assigned] (GEODE-29) Fix all functional/behavioral differences between cache.xml and the public Java API.
Date Thu, 05 Oct 2017 20:18:01 GMT


Kirk Lund reassigned GEODE-29:

    Assignee:     (was: Kirk Lund)

> Fix all functional/behavioral differences between cache.xml and the public Java API.
> ------------------------------------------------------------------------------------
>                 Key: GEODE-29
>                 URL:
>             Project: Geode
>          Issue Type: Improvement
>          Components: configuration
>    Affects Versions: 1.0.0-incubating
>         Environment: Apache Geode configured either with cache.xml, public Java API or
Gfsh (+Cluster Config, an extension of cache.xml).
>            Reporter: John Blum
>            Priority: Critical
>              Labels: ApacheGeode, CacheXML, PublicJavaAPI
> Certain _Apache Geode_ functions/behaviors are encapsulated in "internal" classes.  Therefore,
when a developer initially uses {{cache.xml}} to configure _Geode_ and then (perhaps) switches
to configuring a node programmatically using the public, Java API with seemingly equivalent
and complimentary configuration logic certain things cease to "work as expected."
> For example...
> 1. Premature GatewayReceiver start before Region exists resulting in event/data loss
> In {{cache.xml}}, if a developer defines a {{GatewayReceiver}} along with Regions that
may potentially be updated by the {{GatewayReceiver}}, _Goede_ is careful not to "start" the
{{GatewayReceiver}} until all the Regions have been created when processing (parsing and initializing
_Geode_ components) the {{cache.xml}}.
> If _Geode_ were to start the {{GatewayReceiver}} "prematurely", and then events from
the remote WAN site arrive before the Regions targeted by those events are created, then Geode
will drop those events, thus causing data loss.  Therefore _Geode's_ logic when processing
{{cache.xml}} prevents this from happening.
> However, if a developer uses the public, Java API to define the same configuration, no
out-of-box protection is offered to prevent event (data) loss from happening, thus leaving
application developers of the _Geode_ API to know how _Geode_ functions "internally".
> Fortunately, application developers are not completely left to fend for themselves and
be purview to all the details.  Technologies, such as _Spring Data GemFire_, also consume
and adhere to the _Geode_ public, Java API (and +only+ the "public" Java API; "internal" classes
 are not used given they are subject to change), is able to handle this using Spring's robust
bean container lifecycle management features.  However, other application consumers using
the API will not fare as well.
> 2. Another problem stems from the poorly conceived and "imposed" ordering of persistent
> For instance, if I have 2 Members, each defining 2 persistent Regions, for which the
Members are the "primary" for 1 of the 2 Regions and the 'other' Member hosts the redundant
copy, like so...
> Member    Regions
> -------------------------
> X               B, A'
> Y               A, B'
> Tick (') -  indicates member (e.g. X) is the primary for a particular Region (i.e. A).
> Then, the system can result in a distributed deadlock due the non-apparent, non-arbitrary
dependency between the Members caused by an improper configuration order of the Regions.
> In this situation, the primary Member for a Region must start before the Member hosting
the redundant Region copy (secondary) because it is a property of _Geode" that the primary
will have most recent, correct copy of the data.
> But, as I have illustrated above, when the system starts, and because I have defined
the Regions in an improper (arbitrary) order, this system will deadlock.  I.e. when Member
X starts, it will attempt to create Region B first.  However, Member X must wait for Member
Y to start since Member Y is the "primary" for Region B.
> However, when Member Y starts, and because it tries to create Region A first, it too
will wait on Member X hosting the "primary" copy of Region A thereby leading to a situation
where each Member waits for the other and results in a distributed deadlock.
> This example is pretty scaled and get more complex as you add Members and additional
Regions in a complex system.
> Of course, the "easy" solution is to ensure the Members in the cluster declaring the
Region all define the Regions in their configuration in the "same order".  This is made even
easier with the use of a cluster-wide, shared configuration using the Cluster Configuration
Service).  So by defining all Regions in the same order on every Member (e.g. A followed by
B), then a developer/user can avoid the distributed deadlock.
> However, it is naive for _Geode_ to assume users will know/conform to this restriction
and impose an non-arbitrary order to workaround, basically, a technical limitation of the
> In other environments, such as Spring, you cannot necessarily guarantee what the order
will be at runtime, especially if application components (e.g. DAO's) inject references to
GemFire components (e.g. Regions) along with using in combination other advanced Spring container
features like CLASSPATH component-scanning to wire up the entire application.
> Even "collocation" has an impact on the Region creation order since Spring must logically
satisfy the "dependency" order of the beans first.  This is both logical and makes sense,
where as Geode's ordering is non-arbitrary and non-apparent since any Member could host the
redundant copy.  Therefore, this problem is an implementation detail leaked.
> Technically, the same problem can be reproduced in {{cache.xml}} for that matter with
no Spring present.  And, this problem is especially more likely to happen using the public
Java API since again, there is no special *magic* being handled by "internal" Geode classes
(in this case) w.r.t. to {{cache.xml}}.  Users/developers just have to know the correct ordering.

This message was sent by Atlassian JIRA

View raw message