polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tasos Parisinos <tas...@projectbeagle.com>
Subject Re: [qi4j-dev] Using Qi4j as a skeleton framework in a high throughput, highly concurrent servlet deployment (and problems with race conditions)
Date Wed, 08 Apr 2015 10:20:09 GMT
Hi devs

Because i was working on the branch with the application pool, i don't have
any stack traces yet. When i'll finish with that and switch to the other
branch i'll make a collection and send them in a future attachment

Best regards
Tasos Parisinos


On Wed, Apr 8, 2015 at 12:58 PM, Niclas Hedhman <niclas@hedhman.org> wrote:

> Ok, simpler tests are better... if you can manage that would be great!!!
>
> You didn't happen to capture any stack traces?
>
> The @Optional related violations happens in two cases (I am sure you know
> this, but want to re-iterate to be safe);
>
>   a) If a Property<?> has not been set or has been set to null, and
> @Optional has not been declared on the method for the Property<?>
>
>   b) If null is passed as a method argument on any method on the Composite
> Type interface, and that method parameter doesn't have @Optional annotation.
>
> The stack trace should be able to reveal which case it is, and if the
> exception isn't containing enough information around the problem, then we
> try to add that.
>
>
> Thanks
> Niclas
>
> On Wed, Apr 8, 2015 at 6:43 AM, Tasos Parisinos <tasosp@projectbeagle.com>
> wrote:
>
>> Hi Niclas
>>
>> The problems i see most of the times come from the builder.newInstance()
>> call. The errors are constraint violations for data combinations that can
>> happen only due to race conditions (the @Optional violation i mention in an
>> earlier mail). For example i got an error that says that an immutable
>> property gets set, when there is no such code in the project, apart from
>> prototype initialization!
>>
>> BUT some other synchronization errors indeed come while the composite
>> methods are called! I noticed that clearly in some cases.
>>
>> Anyway in both cases either synchronization within qi4j codebase is near
>> to impossible, or defeats optimizations, from what i gather.
>>
>> I came down to a solution to my problem the other day creating/using an
>> application pool. Each request thread takes an application from the pool,
>> uses it, cleans it up and returns it to the pool. These applications are
>> contextual enough for my DCI design and the result is rewarding.
>>
>> BUT as Paul Merlin proposed the other day, i will create a cut down
>> version of our servlet, as a proof of concept that reproduces the error in
>> order for you to narrow it down and kill the bug.
>>
>> This will come as a webapp for tomcat that you can stress test for high
>> concurrency with jmeter. It will be an Intellij-14 project. It will build
>> with maven and if you are ok, it would be nice for it to perform an SQL
>> query through Hibernate to some database. I can go all the way with the
>> database because the latter types of error, the ones that can be related
>> with invocation stacks are revealed there. We will see. If it is a burden
>> for you to setup this, i can make something simpler
>>
>> Happy easter
>> Tasos Parisinos
>>
>>
>>
>>
>> On Sun, Apr 5, 2015 at 9:24 AM, Niclas Hedhman <niclas@hedhman.org>
>> wrote:
>>
>>> Tasos,
>>> I have had a first look at the code yesterday, and I couldn't fathom
>>> that there would be an issue in the creation and/or usage of builder. The
>>> code has been kept simple on purpose to ensure that we can guarantee
>>> thread-safety where we say so.
>>>
>>> You state that it works if you synchronize that particular section, and
>>> draw the conclusion that it is builder creation related, BUT could it be
>>> that the problem is actually happening in the method invocation itself, but
>>> the mention syncronization will "serialize" the use of subsequent calls to
>>> the created value/transient instance, and therefor no concurrency happens
>>> in the method invocation?
>>>
>>>
>>> The reason I mention this is because, I think that the codebase still
>>> shares method invocation stacks across composite instances, and creates new
>>> one on-demand. And that code is not as simple as the Factory/Builder code,
>>> and we also tried to make that as performant as possible.
>>>
>>> Happy Easter everyone.
>>> Niclas (from Ankara, visiting Alex Karasulu, one of the Zest PMC members)
>>>
>>> On Fri, Apr 3, 2015 at 9:51 PM, Tasos Parisinos <
>>> tasosp@projectbeagle.com> wrote:
>>>
>>>> Hi again
>>>>
>>>> I would also like to add at this point, that if anyone of you Zest
>>>> developers can suggest, pinpoint or at least narrow down the pieces of code
>>>> that implement the related problematic parts, it would be tremendously
>>>> helpful to us, in order to refactor this code and try to suggest a solution
>>>> on our own.
>>>>
>>>> For us, it is paramount to resolve this as fast as possible, to carry
>>>> on implementing our core business code.
>>>>
>>>> Also, using this occasion, I would like to join Zest's core developer
>>>> team, as I'm a great fan of the framework and thus we have based our
>>>> platform on it.
>>>>
>>>> As a CTO of projectbeagle I'm also very eager to contribute parts of
>>>> our implementation back to the Open Source Community as a Qi4j (well...
>>>> Zest) library, extension or tool!
>>>>
>>>> Best Regards
>>>> Tasos Parisinos
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 3, 2015 at 3:58 PM, Tasos Parisinos <
>>>> tasosp@projectbeagle.com> wrote:
>>>>
>>>>> Hello Niclas and all
>>>>>
>>>>> I'll start from the bottom of your response and work my way up.
>>>>>
>>>>> First of all, thanks for your response, i appreciate it.
>>>>> Congratulations to the whole Qi4j team for becoming an ASF project,
>>>>> although i prefer the old name... Nevertheless it is a milestone for
this
>>>>> awesome framework.
>>>>>
>>>>> About performance. We have been writing an availability query for a
>>>>> bed-bank. These queries are massive, working on tens of tables at once,
on
>>>>> big data. So the very question of throughput for us is not only code
>>>>> related. In the final picture we will be talking about massive database
-
>>>>> servlet container clusters that will be able to spit out 15.000 A.C.I.D
>>>>> transactions per second.
>>>>>
>>>>> For our prototyping phase, achieving 5000 of them running the full
>>>>> query on test/sample data on a single machine was a breakthrough on its
>>>>> own. And we haven't really started to push this system, just code and
basic
>>>>> system optimizations. This will grow.
>>>>>
>>>>> Oh by the way, we are www.projectbeagle.com, based in Greece.
>>>>>
>>>>> Our first attempt was to have a single Qi4j runtime and application
>>>>> PER request thread. This has become a non-trivial application with multiple
>>>>> services, lots of layers and modules, so assembling it into an application
>>>>> takes time. We can't afford this. That's why we moved all this code to
be
>>>>> executed during deployment time. All requests (all servicing threads)
will
>>>>> use this unique application to perform DI and composition. In the future,
a
>>>>> secondary, contextual Qi4j application maybe added to the picture.
>>>>>
>>>>> So, when we did that, throughput skyrocketed but race conditions
>>>>> started. Let me give you some examples. All are related with composition,
>>>>> with either value and transient builders and their factories.We don't
use
>>>>> any kind of entity composites (we have Hibernate as ORM and we do
>>>>> persistence in a tricky way - another story). All composites once built
>>>>> work fine, no problem with them.
>>>>>
>>>>> So this is a small code example from our project's QueryBuilder.
>>>>> QueryBuilder has multiple APIs (multiple interfaces) and each is
>>>>> implemented by a different Mixin (abstract classes). This is its Hibernate
>>>>> implementation. We have also a mock one. This is one of the QueryBuilder
>>>>> API methods, that creates a WHERE clause (field >= value ) for an
SQL query:
>>>>>
>>>>> @Override
>>>>> @Factory
>>>>> public <T> Clause ge(String field, T value)
>>>>> {
>>>>>    synchronized(selfContainer) {
>>>>>       ValueBuilder<Clause> builder = selfContainer.newValueBuilder(Clause.class);
>>>>>
>>>>>       builder.prototype()
>>>>>              .expression()
>>>>>              .set(Restrictions.ge(field, value));
>>>>>
>>>>>       return builder.newInstance();
>>>>>    }
>>>>> }
>>>>>
>>>>>
>>>>> These are called very, very often in the project. After all it is a
>>>>> query engine. Variable 'selfContainer' is injected as
>>>>>
>>>>> @Structure
>>>>> protected Module selfContainer;
>>>>>
>>>>>
>>>>> When we don't lock the buiilder factory (the module), in the way we do
>>>>> we get all sorts of race conditions. For example when newInstance() is
>>>>> called it can fail with a constraint violation exception saying that
>>>>> expression is not optional. But the call Restrictions.ge() can never
return
>>>>> null. So when one thread comes to call newInstance(), another thread
has
>>>>> already messed up with the builder factory. The builders themselves as
you
>>>>> see are local variables (but they may not be, it depends on how they
are
>>>>> implemented inside their factory)
>>>>>
>>>>> There are other ways it can fail. For example saying that the builder
>>>>> can't find a proper fragment with a ge implementation. All these errors
are
>>>>> so absurd, for such simple code that they can only be race conditions.
I
>>>>> will collect as much exception dumps of such errors and send them to
you in
>>>>> a future attachment.
>>>>>
>>>>> When we synchronize in this fashion, problems go away. But this has
>>>>> two basic caveats
>>>>>
>>>>> 1. Performance penalty (obvious)
>>>>> 2. A schroedinger's cat situation. We don't know if the problem went
>>>>> away because we synchronize or because concurrency falls to such a degree
>>>>> that the propability of a race conditions falls dramatically, only to
>>>>> appear on production machines later on
>>>>>
>>>>>
>>>>> Best regards
>>>>> Tasos Parisinos
>>>>>
>>>>> On Thu, Apr 2, 2015 at 11:03 AM, Niclas Hedhman <niclas@hedhman.org>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> The general "rule" is that Factories (i.e. implemented by Module
>>>>>> nowadays) should be thread safe, Builders are NOT thread-safe, and
are
>>>>>> expected to be created at each use. Are you trying to re-use the
Builders?
>>>>>> If not, i.e. you do newXyzBuilder() on each use, and you are seeing
>>>>>> threading issues, then that is bug(s) and I would love to get hold
of the
>>>>>> details.
>>>>>>
>>>>>> ValueComposites -> thread-safe by definition, once created.
>>>>>>
>>>>>> EntityComposites -> MUST NOT be handed between threads, and is
>>>>>> therefor indirectly thread-safe.
>>>>>>
>>>>>> TransitentComposites -> Internals are expected to be thread-safe,
but
>>>>>> changes at 'user level' needs to be taken care of.
>>>>>>
>>>>>> ServiceComposites -> Internals are expected to be thread-safe,
but
>>>>>> user level might need care.
>>>>>>
>>>>>> ConfigurationComposites -> They are entities, and therefor inherits
>>>>>> concurrency characteristics.
>>>>>>
>>>>>>
>>>>>> Qi4j isn't really intended for being a speed demon, so 15000 tx/sec
>>>>>> sounds a bit too ambitious to me. Please report back what kind of
numbers
>>>>>> you will eventually manage, even if it is not good enough for you.
>>>>>>
>>>>>> Niclas
>>>>>>
>>>>>> P.S. Qi4j has just been accepted into the Apache Software Foundation,
>>>>>> and will emerge as Apache Zest. dev@zest.apache.org is CC'd for that
>>>>>> reason.
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 1, 2015 at 10:50 PM, Tasos Parisinos <
>>>>>> tasosp@projectbeagle.com> wrote:
>>>>>>
>>>>>>> Thanx for you reply Kent
>>>>>>>
>>>>>>> I agree with you that builder instances should be created used
and
>>>>>>> discarded inside a single request (a single thread from the servlet
>>>>>>> container pool). The builder factories though, as the application
itself
>>>>>>> should be used commonly across all request threads (in a synchronized
>>>>>>> manner) in order to avoid instantiating such an application PER
thread, as
>>>>>>> this will greatly compromise performance. The use of putIfAbsent
in that
>>>>>>> context seems to be corrent. I'll give it a try and update you
with results
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, April 1, 2015 at 10:26:16 PM UTC+3, kent.soelvsten
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  I am not an expert so it might be the blind leading the
deaf
>>>>>>>> ......
>>>>>>>>
>>>>>>>> but i sense a potential problem with concurrent access to
various
>>>>>>>> variants of ValueBuilderFactory#newValueBuilder and
>>>>>>>> TransientBuilderFactory#newTransientBuilder.
>>>>>>>> (the internal usage of ConcurrentHashMap inside TypeLookup
-
>>>>>>>> shouldn't we use putIfAbsent?).
>>>>>>>>
>>>>>>>> So that would be good candidates for synchronization. If
that
>>>>>>>> solves your problem i believe you might have found a bug
- and a
>>>>>>>> work-around.
>>>>>>>> ValueBuilder and TransientBuilder instances should probably
be
>>>>>>>> created, used and discarded inside a single web request and
not reused.
>>>>>>>>
>>>>>>>> /Kent
>>>>>>>>
>>>>>>>>
>>>>>>>> Den 01-04-2015 kl. 20:07 skrev Tasos Parisinos:
>>>>>>>>
>>>>>>>> Hi all
>>>>>>>>
>>>>>>>>  Let me describe my problem. We have implemented a servlet
>>>>>>>> (deployed in tomcat) that takes a REST request and based
on its query
>>>>>>>> parameters, it builds and executes a single query (using
Hibernate ORM)
>>>>>>>> within a JTA transaction (using Atomikos). The application
specifics are
>>>>>>>> not important, what is important is that we need high throughput
(15.000
>>>>>>>> trx / sec is our objective).
>>>>>>>>
>>>>>>>>  We have implemented all infrastructure code using Qi4j for
COP
>>>>>>>> and DI as well as Property<T> data validation (constraint
annotations). In
>>>>>>>> deployment time (in a separate thread) we assemble and activate
two Qi4j
>>>>>>>> runtimes, each with a Qi4j application. The first is used
only during
>>>>>>>> deployment, while the second is used in ALL  threads that
serve requests.
>>>>>>>> Using Qi4j this second application, starts various ServiceComposite
while
>>>>>>>> the servlet deployes, for eager initialization (logger service,
mapping
>>>>>>>> service, repository service, rest service, application services,
domain
>>>>>>>> services, transaction service, token service to name only
some). We
>>>>>>>> implement our Use Cases with a DCI design.
>>>>>>>>
>>>>>>>>  These services and DCI code uses various ValueBuilder<T>
and
>>>>>>>> TransientBuilder<T> to do composition.
>>>>>>>>
>>>>>>>>  The problem is:
>>>>>>>>
>>>>>>>>  Because ALL request threads, use the same Qi4j application,
we
>>>>>>>> have various race conditions that are mainly associated with
the various
>>>>>>>> builders. These race conditions appear when the servlet serves
more that
>>>>>>>> 2000 trx / sec. Sacrificing some throughput we can synchronize
shared
>>>>>>>> variables, but to minimize performance impact we need to
know:
>>>>>>>>
>>>>>>>>  1. What is the best practice for such cases
>>>>>>>> 2. Which part of ValueBuilderFactory, ValueBuilder<T>,
>>>>>>>> TransientBuilderFactory, TransientBuilder<T> is best
to synchronize?
>>>>>>>>
>>>>>>>>  Thanx in advance
>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the
Google
>>>>>>>> Groups "qi4j-dev" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails
from it,
>>>>>>>> send an email to qi4j-dev+u...@googlegroups.com.
>>>>>>>> To post to this group, send email to qi4j...@googlegroups.com.
>>>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>>
>>>>>>>>   --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "qi4j-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from
it,
>>>>>>> send an email to qi4j-dev+unsubscribe@googlegroups.com.
>>>>>>> To post to this group, send email to qi4j-dev@googlegroups.com.
>>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Niclas Hedhman, Software Developer
>>>>>> http://www.qi4j.org - New Energy for Java
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Niclas Hedhman, Software Developer
>>> http://www.qi4j.org - New Energy for Java
>>>
>>
>>
>
>
> --
> Niclas Hedhman, Software Developer
> http://www.qi4j.org - New Energy for Java
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message