polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tasos Parisinos <tas...@projectbeagle.com>
Subject Re: [qi4j-dev] Using Qi4j as a skeleton framework in a high throughput, highly concurrent servlet deployment (and problems with race conditions)
Date Fri, 03 Apr 2015 12:58:11 GMT
Hello Niclas and all

I'll start from the bottom of your response and work my way up.

First of all, thanks for your response, i appreciate it.
Congratulations to the whole Qi4j team for becoming an ASF project,
although i prefer the old name... Nevertheless it is a milestone for this
awesome framework.

About performance. We have been writing an availability query for a
bed-bank. These queries are massive, working on tens of tables at once, on
big data. So the very question of throughput for us is not only code
related. In the final picture we will be talking about massive database -
servlet container clusters that will be able to spit out 15.000 A.C.I.D
transactions per second.

For our prototyping phase, achieving 5000 of them running the full query on
test/sample data on a single machine was a breakthrough on its own. And we
haven't really started to push this system, just code and basic system
optimizations. This will grow.

Oh by the way, we are www.projectbeagle.com, based in Greece.

Our first attempt was to have a single Qi4j runtime and application PER
request thread. This has become a non-trivial application with multiple
services, lots of layers and modules, so assembling it into an application
takes time. We can't afford this. That's why we moved all this code to be
executed during deployment time. All requests (all servicing threads) will
use this unique application to perform DI and composition. In the future, a
secondary, contextual Qi4j application maybe added to the picture.

So, when we did that, throughput skyrocketed but race conditions started.
Let me give you some examples. All are related with composition, with
either value and transient builders and their factories.We don't use any
kind of entity composites (we have Hibernate as ORM and we do persistence
in a tricky way - another story). All composites once built work fine, no
problem with them.

So this is a small code example from our project's QueryBuilder.
QueryBuilder has multiple APIs (multiple interfaces) and each is
implemented by a different Mixin (abstract classes). This is its Hibernate
implementation. We have also a mock one. This is one of the QueryBuilder
API methods, that creates a WHERE clause (field >= value ) for an SQL query:

@Override
@Factory
public <T> Clause ge(String field, T value)
{
   synchronized(selfContainer) {
      ValueBuilder<Clause> builder =
selfContainer.newValueBuilder(Clause.class);

      builder.prototype()
             .expression()
             .set(Restrictions.ge(field, value));

      return builder.newInstance();
   }
}


These are called very, very often in the project. After all it is a query
engine. Variable 'selfContainer' is injected as

@Structure
protected Module selfContainer;


When we don't lock the buiilder factory (the module), in the way we do we
get all sorts of race conditions. For example when newInstance() is called
it can fail with a constraint violation exception saying that expression is
not optional. But the call Restrictions.ge() can never return null. So when
one thread comes to call newInstance(), another thread has already messed
up with the builder factory. The builders themselves as you see are local
variables (but they may not be, it depends on how they are implemented
inside their factory)

There are other ways it can fail. For example saying that the builder can't
find a proper fragment with a ge implementation. All these errors are so
absurd, for such simple code that they can only be race conditions. I will
collect as much exception dumps of such errors and send them to you in a
future attachment.

When we synchronize in this fashion, problems go away. But this has two
basic caveats

1. Performance penalty (obvious)
2. A schroedinger's cat situation. We don't know if the problem went away
because we synchronize or because concurrency falls to such a degree that
the propability of a race conditions falls dramatically, only to appear on
production machines later on


Best regards
Tasos Parisinos

On Thu, Apr 2, 2015 at 11:03 AM, Niclas Hedhman <niclas@hedhman.org> wrote:

>
> The general "rule" is that Factories (i.e. implemented by Module nowadays)
> should be thread safe, Builders are NOT thread-safe, and are expected to be
> created at each use. Are you trying to re-use the Builders?  If not, i.e.
> you do newXyzBuilder() on each use, and you are seeing threading issues,
> then that is bug(s) and I would love to get hold of the details.
>
> ValueComposites -> thread-safe by definition, once created.
>
> EntityComposites -> MUST NOT be handed between threads, and is therefor
> indirectly thread-safe.
>
> TransitentComposites -> Internals are expected to be thread-safe, but
> changes at 'user level' needs to be taken care of.
>
> ServiceComposites -> Internals are expected to be thread-safe, but user
> level might need care.
>
> ConfigurationComposites -> They are entities, and therefor inherits
> concurrency characteristics.
>
>
> Qi4j isn't really intended for being a speed demon, so 15000 tx/sec sounds
> a bit too ambitious to me. Please report back what kind of numbers you will
> eventually manage, even if it is not good enough for you.
>
> Niclas
>
> P.S. Qi4j has just been accepted into the Apache Software Foundation, and
> will emerge as Apache Zest. dev@zest.apache.org is CC'd for that reason.
>
>
> On Wed, Apr 1, 2015 at 10:50 PM, Tasos Parisinos <tasosp@projectbeagle.com
> > wrote:
>
>> Thanx for you reply Kent
>>
>> I agree with you that builder instances should be created used and
>> discarded inside a single request (a single thread from the servlet
>> container pool). The builder factories though, as the application itself
>> should be used commonly across all request threads (in a synchronized
>> manner) in order to avoid instantiating such an application PER thread, as
>> this will greatly compromise performance. The use of putIfAbsent in that
>> context seems to be corrent. I'll give it a try and update you with results
>>
>>
>> On Wednesday, April 1, 2015 at 10:26:16 PM UTC+3, kent.soelvsten wrote:
>>>
>>>  I am not an expert so it might be the blind leading the deaf ......
>>>
>>> but i sense a potential problem with concurrent access to various
>>> variants of ValueBuilderFactory#newValueBuilder and
>>> TransientBuilderFactory#newTransientBuilder.
>>> (the internal usage of ConcurrentHashMap inside TypeLookup - shouldn't
>>> we use putIfAbsent?).
>>>
>>> So that would be good candidates for synchronization. If that solves
>>> your problem i believe you might have found a bug - and a work-around.
>>> ValueBuilder and TransientBuilder instances should probably be created,
>>> used and discarded inside a single web request and not reused.
>>>
>>> /Kent
>>>
>>>
>>> Den 01-04-2015 kl. 20:07 skrev Tasos Parisinos:
>>>
>>> Hi all
>>>
>>>  Let me describe my problem. We have implemented a servlet (deployed in
>>> tomcat) that takes a REST request and based on its query parameters, it
>>> builds and executes a single query (using Hibernate ORM) within a JTA
>>> transaction (using Atomikos). The application specifics are not important,
>>> what is important is that we need high throughput (15.000 trx / sec is our
>>> objective).
>>>
>>>  We have implemented all infrastructure code using Qi4j for COP and DI
>>> as well as Property<T> data validation (constraint annotations). In
>>> deployment time (in a separate thread) we assemble and activate two Qi4j
>>> runtimes, each with a Qi4j application. The first is used only during
>>> deployment, while the second is used in ALL  threads that serve requests.
>>> Using Qi4j this second application, starts various ServiceComposite while
>>> the servlet deployes, for eager initialization (logger service, mapping
>>> service, repository service, rest service, application services, domain
>>> services, transaction service, token service to name only some). We
>>> implement our Use Cases with a DCI design.
>>>
>>>  These services and DCI code uses various ValueBuilder<T> and
>>> TransientBuilder<T> to do composition.
>>>
>>>  The problem is:
>>>
>>>  Because ALL request threads, use the same Qi4j application, we have
>>> various race conditions that are mainly associated with the various
>>> builders. These race conditions appear when the servlet serves more that
>>> 2000 trx / sec. Sacrificing some throughput we can synchronize shared
>>> variables, but to minimize performance impact we need to know:
>>>
>>>  1. What is the best practice for such cases
>>> 2. Which part of ValueBuilderFactory, ValueBuilder<T>,
>>> TransientBuilderFactory, TransientBuilder<T> is best to synchronize?
>>>
>>>  Thanx in advance
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "qi4j-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to qi4j-dev+u...@googlegroups.com.
>>> To post to this group, send email to qi4j...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>   --
>> You received this message because you are subscribed to the Google Groups
>> "qi4j-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to qi4j-dev+unsubscribe@googlegroups.com.
>> To post to this group, send email to qi4j-dev@googlegroups.com.
>> Visit this group at http://groups.google.com/group/qi4j-dev.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://www.qi4j.org - New Energy for Java
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message