polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niclas Hedhman <nic...@hedhman.org>
Subject Re: [qi4j-dev] Using Qi4j as a skeleton framework in a high throughput, highly concurrent servlet deployment (and problems with race conditions)
Date Thu, 09 Apr 2015 23:41:27 GMT
Tasos,
Can you check if you can add a new Jira ticket at
https://issues.apache.org/jira/browse/ZEST for this??

That will check that ticket system is open to the public.

Cheers

On Wed, Apr 8, 2015 at 6:20 PM, Tasos Parisinos <tasosp@projectbeagle.com>
wrote:

> Hi devs
>
> Because i was working on the branch with the application pool, i don't
> have any stack traces yet. When i'll finish with that and switch to the
> other branch i'll make a collection and send them in a future attachment
>
> Best regards
> Tasos Parisinos
>
>
> On Wed, Apr 8, 2015 at 12:58 PM, Niclas Hedhman <niclas@hedhman.org>
> wrote:
>
>> Ok, simpler tests are better... if you can manage that would be great!!!
>>
>> You didn't happen to capture any stack traces?
>>
>> The @Optional related violations happens in two cases (I am sure you know
>> this, but want to re-iterate to be safe);
>>
>>   a) If a Property<?> has not been set or has been set to null, and
>> @Optional has not been declared on the method for the Property<?>
>>
>>   b) If null is passed as a method argument on any method on the
>> Composite Type interface, and that method parameter doesn't have @Optional
>> annotation.
>>
>> The stack trace should be able to reveal which case it is, and if the
>> exception isn't containing enough information around the problem, then we
>> try to add that.
>>
>>
>> Thanks
>> Niclas
>>
>> On Wed, Apr 8, 2015 at 6:43 AM, Tasos Parisinos <tasosp@projectbeagle.com
>> > wrote:
>>
>>> Hi Niclas
>>>
>>> The problems i see most of the times come from the builder.newInstance()
>>> call. The errors are constraint violations for data combinations that can
>>> happen only due to race conditions (the @Optional violation i mention in an
>>> earlier mail). For example i got an error that says that an immutable
>>> property gets set, when there is no such code in the project, apart from
>>> prototype initialization!
>>>
>>> BUT some other synchronization errors indeed come while the composite
>>> methods are called! I noticed that clearly in some cases.
>>>
>>> Anyway in both cases either synchronization within qi4j codebase is near
>>> to impossible, or defeats optimizations, from what i gather.
>>>
>>> I came down to a solution to my problem the other day creating/using an
>>> application pool. Each request thread takes an application from the pool,
>>> uses it, cleans it up and returns it to the pool. These applications are
>>> contextual enough for my DCI design and the result is rewarding.
>>>
>>> BUT as Paul Merlin proposed the other day, i will create a cut down
>>> version of our servlet, as a proof of concept that reproduces the error in
>>> order for you to narrow it down and kill the bug.
>>>
>>> This will come as a webapp for tomcat that you can stress test for high
>>> concurrency with jmeter. It will be an Intellij-14 project. It will build
>>> with maven and if you are ok, it would be nice for it to perform an SQL
>>> query through Hibernate to some database. I can go all the way with the
>>> database because the latter types of error, the ones that can be related
>>> with invocation stacks are revealed there. We will see. If it is a burden
>>> for you to setup this, i can make something simpler
>>>
>>> Happy easter
>>> Tasos Parisinos
>>>
>>>
>>>
>>>
>>> On Sun, Apr 5, 2015 at 9:24 AM, Niclas Hedhman <niclas@hedhman.org>
>>> wrote:
>>>
>>>> Tasos,
>>>> I have had a first look at the code yesterday, and I couldn't fathom
>>>> that there would be an issue in the creation and/or usage of builder. The
>>>> code has been kept simple on purpose to ensure that we can guarantee
>>>> thread-safety where we say so.
>>>>
>>>> You state that it works if you synchronize that particular section, and
>>>> draw the conclusion that it is builder creation related, BUT could it be
>>>> that the problem is actually happening in the method invocation itself, but
>>>> the mention syncronization will "serialize" the use of subsequent calls to
>>>> the created value/transient instance, and therefor no concurrency happens
>>>> in the method invocation?
>>>>
>>>>
>>>> The reason I mention this is because, I think that the codebase still
>>>> shares method invocation stacks across composite instances, and creates new
>>>> one on-demand. And that code is not as simple as the Factory/Builder code,
>>>> and we also tried to make that as performant as possible.
>>>>
>>>> Happy Easter everyone.
>>>> Niclas (from Ankara, visiting Alex Karasulu, one of the Zest PMC
>>>> members)
>>>>
>>>> On Fri, Apr 3, 2015 at 9:51 PM, Tasos Parisinos <
>>>> tasosp@projectbeagle.com> wrote:
>>>>
>>>>> Hi again
>>>>>
>>>>> I would also like to add at this point, that if anyone of you Zest
>>>>> developers can suggest, pinpoint or at least narrow down the pieces of
code
>>>>> that implement the related problematic parts, it would be tremendously
>>>>> helpful to us, in order to refactor this code and try to suggest a solution
>>>>> on our own.
>>>>>
>>>>> For us, it is paramount to resolve this as fast as possible, to carry
>>>>> on implementing our core business code.
>>>>>
>>>>> Also, using this occasion, I would like to join Zest's core developer
>>>>> team, as I'm a great fan of the framework and thus we have based our
>>>>> platform on it.
>>>>>
>>>>> As a CTO of projectbeagle I'm also very eager to contribute parts of
>>>>> our implementation back to the Open Source Community as a Qi4j (well...
>>>>> Zest) library, extension or tool!
>>>>>
>>>>> Best Regards
>>>>> Tasos Parisinos
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 3, 2015 at 3:58 PM, Tasos Parisinos <
>>>>> tasosp@projectbeagle.com> wrote:
>>>>>
>>>>>> Hello Niclas and all
>>>>>>
>>>>>> I'll start from the bottom of your response and work my way up.
>>>>>>
>>>>>> First of all, thanks for your response, i appreciate it.
>>>>>> Congratulations to the whole Qi4j team for becoming an ASF project,
>>>>>> although i prefer the old name... Nevertheless it is a milestone
for this
>>>>>> awesome framework.
>>>>>>
>>>>>> About performance. We have been writing an availability query for
a
>>>>>> bed-bank. These queries are massive, working on tens of tables at
once, on
>>>>>> big data. So the very question of throughput for us is not only code
>>>>>> related. In the final picture we will be talking about massive database
-
>>>>>> servlet container clusters that will be able to spit out 15.000 A.C.I.D
>>>>>> transactions per second.
>>>>>>
>>>>>> For our prototyping phase, achieving 5000 of them running the full
>>>>>> query on test/sample data on a single machine was a breakthrough
on its
>>>>>> own. And we haven't really started to push this system, just code
and basic
>>>>>> system optimizations. This will grow.
>>>>>>
>>>>>> Oh by the way, we are www.projectbeagle.com, based in Greece.
>>>>>>
>>>>>> Our first attempt was to have a single Qi4j runtime and application
>>>>>> PER request thread. This has become a non-trivial application with
multiple
>>>>>> services, lots of layers and modules, so assembling it into an application
>>>>>> takes time. We can't afford this. That's why we moved all this code
to be
>>>>>> executed during deployment time. All requests (all servicing threads)
will
>>>>>> use this unique application to perform DI and composition. In the
future, a
>>>>>> secondary, contextual Qi4j application maybe added to the picture.
>>>>>>
>>>>>> So, when we did that, throughput skyrocketed but race conditions
>>>>>> started. Let me give you some examples. All are related with composition,
>>>>>> with either value and transient builders and their factories.We don't
use
>>>>>> any kind of entity composites (we have Hibernate as ORM and we do
>>>>>> persistence in a tricky way - another story). All composites once
built
>>>>>> work fine, no problem with them.
>>>>>>
>>>>>> So this is a small code example from our project's QueryBuilder.
>>>>>> QueryBuilder has multiple APIs (multiple interfaces) and each is
>>>>>> implemented by a different Mixin (abstract classes). This is its
Hibernate
>>>>>> implementation. We have also a mock one. This is one of the QueryBuilder
>>>>>> API methods, that creates a WHERE clause (field >= value ) for
an SQL query:
>>>>>>
>>>>>> @Override
>>>>>> @Factory
>>>>>> public <T> Clause ge(String field, T value)
>>>>>> {
>>>>>>    synchronized(selfContainer) {
>>>>>>       ValueBuilder<Clause> builder = selfContainer.newValueBuilder(Clause.class);
>>>>>>
>>>>>>       builder.prototype()
>>>>>>              .expression()
>>>>>>              .set(Restrictions.ge(field, value));
>>>>>>
>>>>>>       return builder.newInstance();
>>>>>>    }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> These are called very, very often in the project. After all it is
a
>>>>>> query engine. Variable 'selfContainer' is injected as
>>>>>>
>>>>>> @Structure
>>>>>> protected Module selfContainer;
>>>>>>
>>>>>>
>>>>>> When we don't lock the buiilder factory (the module), in the way
we
>>>>>> do we get all sorts of race conditions. For example when newInstance()
is
>>>>>> called it can fail with a constraint violation exception saying that
>>>>>> expression is not optional. But the call Restrictions.ge() can never
return
>>>>>> null. So when one thread comes to call newInstance(), another thread
has
>>>>>> already messed up with the builder factory. The builders themselves
as you
>>>>>> see are local variables (but they may not be, it depends on how they
are
>>>>>> implemented inside their factory)
>>>>>>
>>>>>> There are other ways it can fail. For example saying that the builder
>>>>>> can't find a proper fragment with a ge implementation. All these
errors are
>>>>>> so absurd, for such simple code that they can only be race conditions.
I
>>>>>> will collect as much exception dumps of such errors and send them
to you in
>>>>>> a future attachment.
>>>>>>
>>>>>> When we synchronize in this fashion, problems go away. But this has
>>>>>> two basic caveats
>>>>>>
>>>>>> 1. Performance penalty (obvious)
>>>>>> 2. A schroedinger's cat situation. We don't know if the problem went
>>>>>> away because we synchronize or because concurrency falls to such
a degree
>>>>>> that the propability of a race conditions falls dramatically, only
to
>>>>>> appear on production machines later on
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>> Tasos Parisinos
>>>>>>
>>>>>> On Thu, Apr 2, 2015 at 11:03 AM, Niclas Hedhman <niclas@hedhman.org>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> The general "rule" is that Factories (i.e. implemented by Module
>>>>>>> nowadays) should be thread safe, Builders are NOT thread-safe,
and are
>>>>>>> expected to be created at each use. Are you trying to re-use
the Builders?
>>>>>>> If not, i.e. you do newXyzBuilder() on each use, and you are
seeing
>>>>>>> threading issues, then that is bug(s) and I would love to get
hold of the
>>>>>>> details.
>>>>>>>
>>>>>>> ValueComposites -> thread-safe by definition, once created.
>>>>>>>
>>>>>>> EntityComposites -> MUST NOT be handed between threads, and
is
>>>>>>> therefor indirectly thread-safe.
>>>>>>>
>>>>>>> TransitentComposites -> Internals are expected to be thread-safe,
>>>>>>> but changes at 'user level' needs to be taken care of.
>>>>>>>
>>>>>>> ServiceComposites -> Internals are expected to be thread-safe,
but
>>>>>>> user level might need care.
>>>>>>>
>>>>>>> ConfigurationComposites -> They are entities, and therefor
inherits
>>>>>>> concurrency characteristics.
>>>>>>>
>>>>>>>
>>>>>>> Qi4j isn't really intended for being a speed demon, so 15000
tx/sec
>>>>>>> sounds a bit too ambitious to me. Please report back what kind
of numbers
>>>>>>> you will eventually manage, even if it is not good enough for
you.
>>>>>>>
>>>>>>> Niclas
>>>>>>>
>>>>>>> P.S. Qi4j has just been accepted into the Apache Software
>>>>>>> Foundation, and will emerge as Apache Zest. dev@zest.apache.org
is
>>>>>>> CC'd for that reason.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 1, 2015 at 10:50 PM, Tasos Parisinos <
>>>>>>> tasosp@projectbeagle.com> wrote:
>>>>>>>
>>>>>>>> Thanx for you reply Kent
>>>>>>>>
>>>>>>>> I agree with you that builder instances should be created
used and
>>>>>>>> discarded inside a single request (a single thread from the
servlet
>>>>>>>> container pool). The builder factories though, as the application
itself
>>>>>>>> should be used commonly across all request threads (in a
synchronized
>>>>>>>> manner) in order to avoid instantiating such an application
PER thread, as
>>>>>>>> this will greatly compromise performance. The use of putIfAbsent
in that
>>>>>>>> context seems to be corrent. I'll give it a try and update
you with results
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wednesday, April 1, 2015 at 10:26:16 PM UTC+3, kent.soelvsten
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  I am not an expert so it might be the blind leading
the deaf
>>>>>>>>> ......
>>>>>>>>>
>>>>>>>>> but i sense a potential problem with concurrent access
to various
>>>>>>>>> variants of ValueBuilderFactory#newValueBuilder and
>>>>>>>>> TransientBuilderFactory#newTransientBuilder.
>>>>>>>>> (the internal usage of ConcurrentHashMap inside TypeLookup
-
>>>>>>>>> shouldn't we use putIfAbsent?).
>>>>>>>>>
>>>>>>>>> So that would be good candidates for synchronization.
If that
>>>>>>>>> solves your problem i believe you might have found a
bug - and a
>>>>>>>>> work-around.
>>>>>>>>> ValueBuilder and TransientBuilder instances should probably
be
>>>>>>>>> created, used and discarded inside a single web request
and not reused.
>>>>>>>>>
>>>>>>>>> /Kent
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Den 01-04-2015 kl. 20:07 skrev Tasos Parisinos:
>>>>>>>>>
>>>>>>>>> Hi all
>>>>>>>>>
>>>>>>>>>  Let me describe my problem. We have implemented a servlet
>>>>>>>>> (deployed in tomcat) that takes a REST request and based
on its query
>>>>>>>>> parameters, it builds and executes a single query (using
Hibernate ORM)
>>>>>>>>> within a JTA transaction (using Atomikos). The application
specifics are
>>>>>>>>> not important, what is important is that we need high
throughput (15.000
>>>>>>>>> trx / sec is our objective).
>>>>>>>>>
>>>>>>>>>  We have implemented all infrastructure code using Qi4j
for COP
>>>>>>>>> and DI as well as Property<T> data validation (constraint
annotations). In
>>>>>>>>> deployment time (in a separate thread) we assemble and
activate two Qi4j
>>>>>>>>> runtimes, each with a Qi4j application. The first is
used only during
>>>>>>>>> deployment, while the second is used in ALL  threads
that serve requests.
>>>>>>>>> Using Qi4j this second application, starts various ServiceComposite
while
>>>>>>>>> the servlet deployes, for eager initialization (logger
service, mapping
>>>>>>>>> service, repository service, rest service, application
services, domain
>>>>>>>>> services, transaction service, token service to name
only some). We
>>>>>>>>> implement our Use Cases with a DCI design.
>>>>>>>>>
>>>>>>>>>  These services and DCI code uses various ValueBuilder<T>
and
>>>>>>>>> TransientBuilder<T> to do composition.
>>>>>>>>>
>>>>>>>>>  The problem is:
>>>>>>>>>
>>>>>>>>>  Because ALL request threads, use the same Qi4j application,
we
>>>>>>>>> have various race conditions that are mainly associated
with the various
>>>>>>>>> builders. These race conditions appear when the servlet
serves more that
>>>>>>>>> 2000 trx / sec. Sacrificing some throughput we can synchronize
shared
>>>>>>>>> variables, but to minimize performance impact we need
to know:
>>>>>>>>>
>>>>>>>>>  1. What is the best practice for such cases
>>>>>>>>> 2. Which part of ValueBuilderFactory, ValueBuilder<T>,
>>>>>>>>> TransientBuilderFactory, TransientBuilder<T> is
best to synchronize?
>>>>>>>>>
>>>>>>>>>  Thanx in advance
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> You received this message because you are subscribed
to the Google
>>>>>>>>> Groups "qi4j-dev" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails
from it,
>>>>>>>>> send an email to qi4j-dev+u...@googlegroups.com.
>>>>>>>>> To post to this group, send email to qi4j...@googlegroups.com.
>>>>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   --
>>>>>>>> You received this message because you are subscribed to the
Google
>>>>>>>> Groups "qi4j-dev" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails
from it,
>>>>>>>> send an email to qi4j-dev+unsubscribe@googlegroups.com.
>>>>>>>> To post to this group, send email to qi4j-dev@googlegroups.com.
>>>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Niclas Hedhman, Software Developer
>>>>>>> http://www.qi4j.org - New Energy for Java
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Niclas Hedhman, Software Developer
>>>> http://www.qi4j.org - New Energy for Java
>>>>
>>>
>>>
>>
>>
>> --
>> Niclas Hedhman, Software Developer
>> http://www.qi4j.org - New Energy for Java
>>
>
>


-- 
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message