polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niclas Hedhman <nic...@hedhman.org>
Subject Re: [qi4j-dev] Using Qi4j as a skeleton framework in a high throughput, highly concurrent servlet deployment (and problems with race conditions)
Date Wed, 08 Apr 2015 09:58:19 GMT
Ok, simpler tests are better... if you can manage that would be great!!!

You didn't happen to capture any stack traces?

The @Optional related violations happens in two cases (I am sure you know
this, but want to re-iterate to be safe);

  a) If a Property<?> has not been set or has been set to null, and
@Optional has not been declared on the method for the Property<?>

  b) If null is passed as a method argument on any method on the Composite
Type interface, and that method parameter doesn't have @Optional annotation.

The stack trace should be able to reveal which case it is, and if the
exception isn't containing enough information around the problem, then we
try to add that.


On Wed, Apr 8, 2015 at 6:43 AM, Tasos Parisinos <tasosp@projectbeagle.com>

> Hi Niclas
> The problems i see most of the times come from the builder.newInstance()
> call. The errors are constraint violations for data combinations that can
> happen only due to race conditions (the @Optional violation i mention in an
> earlier mail). For example i got an error that says that an immutable
> property gets set, when there is no such code in the project, apart from
> prototype initialization!
> BUT some other synchronization errors indeed come while the composite
> methods are called! I noticed that clearly in some cases.
> Anyway in both cases either synchronization within qi4j codebase is near
> to impossible, or defeats optimizations, from what i gather.
> I came down to a solution to my problem the other day creating/using an
> application pool. Each request thread takes an application from the pool,
> uses it, cleans it up and returns it to the pool. These applications are
> contextual enough for my DCI design and the result is rewarding.
> BUT as Paul Merlin proposed the other day, i will create a cut down
> version of our servlet, as a proof of concept that reproduces the error in
> order for you to narrow it down and kill the bug.
> This will come as a webapp for tomcat that you can stress test for high
> concurrency with jmeter. It will be an Intellij-14 project. It will build
> with maven and if you are ok, it would be nice for it to perform an SQL
> query through Hibernate to some database. I can go all the way with the
> database because the latter types of error, the ones that can be related
> with invocation stacks are revealed there. We will see. If it is a burden
> for you to setup this, i can make something simpler
> Happy easter
> Tasos Parisinos
> On Sun, Apr 5, 2015 at 9:24 AM, Niclas Hedhman <niclas@hedhman.org> wrote:
>> Tasos,
>> I have had a first look at the code yesterday, and I couldn't fathom that
>> there would be an issue in the creation and/or usage of builder. The code
>> has been kept simple on purpose to ensure that we can guarantee
>> thread-safety where we say so.
>> You state that it works if you synchronize that particular section, and
>> draw the conclusion that it is builder creation related, BUT could it be
>> that the problem is actually happening in the method invocation itself, but
>> the mention syncronization will "serialize" the use of subsequent calls to
>> the created value/transient instance, and therefor no concurrency happens
>> in the method invocation?
>> The reason I mention this is because, I think that the codebase still
>> shares method invocation stacks across composite instances, and creates new
>> one on-demand. And that code is not as simple as the Factory/Builder code,
>> and we also tried to make that as performant as possible.
>> Happy Easter everyone.
>> Niclas (from Ankara, visiting Alex Karasulu, one of the Zest PMC members)
>> On Fri, Apr 3, 2015 at 9:51 PM, Tasos Parisinos <tasosp@projectbeagle.com
>> > wrote:
>>> Hi again
>>> I would also like to add at this point, that if anyone of you Zest
>>> developers can suggest, pinpoint or at least narrow down the pieces of code
>>> that implement the related problematic parts, it would be tremendously
>>> helpful to us, in order to refactor this code and try to suggest a solution
>>> on our own.
>>> For us, it is paramount to resolve this as fast as possible, to carry on
>>> implementing our core business code.
>>> Also, using this occasion, I would like to join Zest's core developer
>>> team, as I'm a great fan of the framework and thus we have based our
>>> platform on it.
>>> As a CTO of projectbeagle I'm also very eager to contribute parts of our
>>> implementation back to the Open Source Community as a Qi4j (well... Zest)
>>> library, extension or tool!
>>> Best Regards
>>> Tasos Parisinos
>>> On Fri, Apr 3, 2015 at 3:58 PM, Tasos Parisinos <
>>> tasosp@projectbeagle.com> wrote:
>>>> Hello Niclas and all
>>>> I'll start from the bottom of your response and work my way up.
>>>> First of all, thanks for your response, i appreciate it.
>>>> Congratulations to the whole Qi4j team for becoming an ASF project,
>>>> although i prefer the old name... Nevertheless it is a milestone for this
>>>> awesome framework.
>>>> About performance. We have been writing an availability query for a
>>>> bed-bank. These queries are massive, working on tens of tables at once, on
>>>> big data. So the very question of throughput for us is not only code
>>>> related. In the final picture we will be talking about massive database -
>>>> servlet container clusters that will be able to spit out 15.000 A.C.I.D
>>>> transactions per second.
>>>> For our prototyping phase, achieving 5000 of them running the full
>>>> query on test/sample data on a single machine was a breakthrough on its
>>>> own. And we haven't really started to push this system, just code and basic
>>>> system optimizations. This will grow.
>>>> Oh by the way, we are www.projectbeagle.com, based in Greece.
>>>> Our first attempt was to have a single Qi4j runtime and application PER
>>>> request thread. This has become a non-trivial application with multiple
>>>> services, lots of layers and modules, so assembling it into an application
>>>> takes time. We can't afford this. That's why we moved all this code to be
>>>> executed during deployment time. All requests (all servicing threads) will
>>>> use this unique application to perform DI and composition. In the future,
>>>> secondary, contextual Qi4j application maybe added to the picture.
>>>> So, when we did that, throughput skyrocketed but race conditions
>>>> started. Let me give you some examples. All are related with composition,
>>>> with either value and transient builders and their factories.We don't use
>>>> any kind of entity composites (we have Hibernate as ORM and we do
>>>> persistence in a tricky way - another story). All composites once built
>>>> work fine, no problem with them.
>>>> So this is a small code example from our project's QueryBuilder.
>>>> QueryBuilder has multiple APIs (multiple interfaces) and each is
>>>> implemented by a different Mixin (abstract classes). This is its Hibernate
>>>> implementation. We have also a mock one. This is one of the QueryBuilder
>>>> API methods, that creates a WHERE clause (field >= value ) for an SQL
>>>> @Override
>>>> @Factory
>>>> public <T> Clause ge(String field, T value)
>>>> {
>>>>    synchronized(selfContainer) {
>>>>       ValueBuilder<Clause> builder = selfContainer.newValueBuilder(Clause.class);
>>>>       builder.prototype()
>>>>              .expression()
>>>>              .set(Restrictions.ge(field, value));
>>>>       return builder.newInstance();
>>>>    }
>>>> }
>>>> These are called very, very often in the project. After all it is a
>>>> query engine. Variable 'selfContainer' is injected as
>>>> @Structure
>>>> protected Module selfContainer;
>>>> When we don't lock the buiilder factory (the module), in the way we do
>>>> we get all sorts of race conditions. For example when newInstance() is
>>>> called it can fail with a constraint violation exception saying that
>>>> expression is not optional. But the call Restrictions.ge() can never return
>>>> null. So when one thread comes to call newInstance(), another thread has
>>>> already messed up with the builder factory. The builders themselves as you
>>>> see are local variables (but they may not be, it depends on how they are
>>>> implemented inside their factory)
>>>> There are other ways it can fail. For example saying that the builder
>>>> can't find a proper fragment with a ge implementation. All these errors are
>>>> so absurd, for such simple code that they can only be race conditions. I
>>>> will collect as much exception dumps of such errors and send them to you
>>>> a future attachment.
>>>> When we synchronize in this fashion, problems go away. But this has two
>>>> basic caveats
>>>> 1. Performance penalty (obvious)
>>>> 2. A schroedinger's cat situation. We don't know if the problem went
>>>> away because we synchronize or because concurrency falls to such a degree
>>>> that the propability of a race conditions falls dramatically, only to
>>>> appear on production machines later on
>>>> Best regards
>>>> Tasos Parisinos
>>>> On Thu, Apr 2, 2015 at 11:03 AM, Niclas Hedhman <niclas@hedhman.org>
>>>> wrote:
>>>>> The general "rule" is that Factories (i.e. implemented by Module
>>>>> nowadays) should be thread safe, Builders are NOT thread-safe, and are
>>>>> expected to be created at each use. Are you trying to re-use the Builders?
>>>>> If not, i.e. you do newXyzBuilder() on each use, and you are seeing
>>>>> threading issues, then that is bug(s) and I would love to get hold of
>>>>> details.
>>>>> ValueComposites -> thread-safe by definition, once created.
>>>>> EntityComposites -> MUST NOT be handed between threads, and is
>>>>> therefor indirectly thread-safe.
>>>>> TransitentComposites -> Internals are expected to be thread-safe,
>>>>> changes at 'user level' needs to be taken care of.
>>>>> ServiceComposites -> Internals are expected to be thread-safe, but
>>>>> user level might need care.
>>>>> ConfigurationComposites -> They are entities, and therefor inherits
>>>>> concurrency characteristics.
>>>>> Qi4j isn't really intended for being a speed demon, so 15000 tx/sec
>>>>> sounds a bit too ambitious to me. Please report back what kind of numbers
>>>>> you will eventually manage, even if it is not good enough for you.
>>>>> Niclas
>>>>> P.S. Qi4j has just been accepted into the Apache Software Foundation,
>>>>> and will emerge as Apache Zest. dev@zest.apache.org is CC'd for that
>>>>> reason.
>>>>> On Wed, Apr 1, 2015 at 10:50 PM, Tasos Parisinos <
>>>>> tasosp@projectbeagle.com> wrote:
>>>>>> Thanx for you reply Kent
>>>>>> I agree with you that builder instances should be created used and
>>>>>> discarded inside a single request (a single thread from the servlet
>>>>>> container pool). The builder factories though, as the application
>>>>>> should be used commonly across all request threads (in a synchronized
>>>>>> manner) in order to avoid instantiating such an application PER thread,
>>>>>> this will greatly compromise performance. The use of putIfAbsent
in that
>>>>>> context seems to be corrent. I'll give it a try and update you with
>>>>>> On Wednesday, April 1, 2015 at 10:26:16 PM UTC+3, kent.soelvsten
>>>>>> wrote:
>>>>>>>  I am not an expert so it might be the blind leading the deaf
>>>>>>> but i sense a potential problem with concurrent access to various
>>>>>>> variants of ValueBuilderFactory#newValueBuilder and
>>>>>>> TransientBuilderFactory#newTransientBuilder.
>>>>>>> (the internal usage of ConcurrentHashMap inside TypeLookup -
>>>>>>> shouldn't we use putIfAbsent?).
>>>>>>> So that would be good candidates for synchronization. If that
>>>>>>> your problem i believe you might have found a bug - and a work-around.
>>>>>>> ValueBuilder and TransientBuilder instances should probably be
>>>>>>> created, used and discarded inside a single web request and not
>>>>>>> /Kent
>>>>>>> Den 01-04-2015 kl. 20:07 skrev Tasos Parisinos:
>>>>>>> Hi all
>>>>>>>  Let me describe my problem. We have implemented a servlet
>>>>>>> (deployed in tomcat) that takes a REST request and based on its
>>>>>>> parameters, it builds and executes a single query (using Hibernate
>>>>>>> within a JTA transaction (using Atomikos). The application specifics
>>>>>>> not important, what is important is that we need high throughput
>>>>>>> trx / sec is our objective).
>>>>>>>  We have implemented all infrastructure code using Qi4j for COP
>>>>>>> DI as well as Property<T> data validation (constraint annotations).
>>>>>>> deployment time (in a separate thread) we assemble and activate
two Qi4j
>>>>>>> runtimes, each with a Qi4j application. The first is used only
>>>>>>> deployment, while the second is used in ALL  threads that serve
>>>>>>> Using Qi4j this second application, starts various ServiceComposite
>>>>>>> the servlet deployes, for eager initialization (logger service,
>>>>>>> service, repository service, rest service, application services,
>>>>>>> services, transaction service, token service to name only some).
>>>>>>> implement our Use Cases with a DCI design.
>>>>>>>  These services and DCI code uses various ValueBuilder<T>
>>>>>>> TransientBuilder<T> to do composition.
>>>>>>>  The problem is:
>>>>>>>  Because ALL request threads, use the same Qi4j application,
>>>>>>> have various race conditions that are mainly associated with
the various
>>>>>>> builders. These race conditions appear when the servlet serves
more that
>>>>>>> 2000 trx / sec. Sacrificing some throughput we can synchronize
>>>>>>> variables, but to minimize performance impact we need to know:
>>>>>>>  1. What is the best practice for such cases
>>>>>>> 2. Which part of ValueBuilderFactory, ValueBuilder<T>,
>>>>>>> TransientBuilderFactory, TransientBuilder<T> is best to
>>>>>>>  Thanx in advance
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "qi4j-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>> send an email to qi4j-dev+u...@googlegroups.com.
>>>>>>> To post to this group, send email to qi4j...@googlegroups.com.
>>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>   --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "qi4j-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to qi4j-dev+unsubscribe@googlegroups.com.
>>>>>> To post to this group, send email to qi4j-dev@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> --
>>>>> Niclas Hedhman, Software Developer
>>>>> http://www.qi4j.org - New Energy for Java
>> --
>> Niclas Hedhman, Software Developer
>> http://www.qi4j.org - New Energy for Java

Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message