polygene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tasos Parisinos <tas...@projectbeagle.com>
Subject Re: [qi4j-dev] Using Qi4j as a skeleton framework in a high throughput, highly concurrent servlet deployment (and problems with race conditions)
Date Tue, 07 Apr 2015 22:43:45 GMT
Hi Niclas

The problems i see most of the times come from the builder.newInstance()
call. The errors are constraint violations for data combinations that can
happen only due to race conditions (the @Optional violation i mention in an
earlier mail). For example i got an error that says that an immutable
property gets set, when there is no such code in the project, apart from
prototype initialization!

BUT some other synchronization errors indeed come while the composite
methods are called! I noticed that clearly in some cases.

Anyway in both cases either synchronization within qi4j codebase is near to
impossible, or defeats optimizations, from what i gather.

I came down to a solution to my problem the other day creating/using an
application pool. Each request thread takes an application from the pool,
uses it, cleans it up and returns it to the pool. These applications are
contextual enough for my DCI design and the result is rewarding.

BUT as Paul Merlin proposed the other day, i will create a cut down version
of our servlet, as a proof of concept that reproduces the error in order
for you to narrow it down and kill the bug.

This will come as a webapp for tomcat that you can stress test for high
concurrency with jmeter. It will be an Intellij-14 project. It will build
with maven and if you are ok, it would be nice for it to perform an SQL
query through Hibernate to some database. I can go all the way with the
database because the latter types of error, the ones that can be related
with invocation stacks are revealed there. We will see. If it is a burden
for you to setup this, i can make something simpler

Happy easter
Tasos Parisinos

On Sun, Apr 5, 2015 at 9:24 AM, Niclas Hedhman <niclas@hedhman.org> wrote:

> Tasos,
> I have had a first look at the code yesterday, and I couldn't fathom that
> there would be an issue in the creation and/or usage of builder. The code
> has been kept simple on purpose to ensure that we can guarantee
> thread-safety where we say so.
> You state that it works if you synchronize that particular section, and
> draw the conclusion that it is builder creation related, BUT could it be
> that the problem is actually happening in the method invocation itself, but
> the mention syncronization will "serialize" the use of subsequent calls to
> the created value/transient instance, and therefor no concurrency happens
> in the method invocation?
> The reason I mention this is because, I think that the codebase still
> shares method invocation stacks across composite instances, and creates new
> one on-demand. And that code is not as simple as the Factory/Builder code,
> and we also tried to make that as performant as possible.
> Happy Easter everyone.
> Niclas (from Ankara, visiting Alex Karasulu, one of the Zest PMC members)
> On Fri, Apr 3, 2015 at 9:51 PM, Tasos Parisinos <tasosp@projectbeagle.com>
> wrote:
>> Hi again
>> I would also like to add at this point, that if anyone of you Zest
>> developers can suggest, pinpoint or at least narrow down the pieces of code
>> that implement the related problematic parts, it would be tremendously
>> helpful to us, in order to refactor this code and try to suggest a solution
>> on our own.
>> For us, it is paramount to resolve this as fast as possible, to carry on
>> implementing our core business code.
>> Also, using this occasion, I would like to join Zest's core developer
>> team, as I'm a great fan of the framework and thus we have based our
>> platform on it.
>> As a CTO of projectbeagle I'm also very eager to contribute parts of our
>> implementation back to the Open Source Community as a Qi4j (well... Zest)
>> library, extension or tool!
>> Best Regards
>> Tasos Parisinos
>> On Fri, Apr 3, 2015 at 3:58 PM, Tasos Parisinos <tasosp@projectbeagle.com
>> > wrote:
>>> Hello Niclas and all
>>> I'll start from the bottom of your response and work my way up.
>>> First of all, thanks for your response, i appreciate it.
>>> Congratulations to the whole Qi4j team for becoming an ASF project,
>>> although i prefer the old name... Nevertheless it is a milestone for this
>>> awesome framework.
>>> About performance. We have been writing an availability query for a
>>> bed-bank. These queries are massive, working on tens of tables at once, on
>>> big data. So the very question of throughput for us is not only code
>>> related. In the final picture we will be talking about massive database -
>>> servlet container clusters that will be able to spit out 15.000 A.C.I.D
>>> transactions per second.
>>> For our prototyping phase, achieving 5000 of them running the full query
>>> on test/sample data on a single machine was a breakthrough on its own. And
>>> we haven't really started to push this system, just code and basic system
>>> optimizations. This will grow.
>>> Oh by the way, we are www.projectbeagle.com, based in Greece.
>>> Our first attempt was to have a single Qi4j runtime and application PER
>>> request thread. This has become a non-trivial application with multiple
>>> services, lots of layers and modules, so assembling it into an application
>>> takes time. We can't afford this. That's why we moved all this code to be
>>> executed during deployment time. All requests (all servicing threads) will
>>> use this unique application to perform DI and composition. In the future, a
>>> secondary, contextual Qi4j application maybe added to the picture.
>>> So, when we did that, throughput skyrocketed but race conditions
>>> started. Let me give you some examples. All are related with composition,
>>> with either value and transient builders and their factories.We don't use
>>> any kind of entity composites (we have Hibernate as ORM and we do
>>> persistence in a tricky way - another story). All composites once built
>>> work fine, no problem with them.
>>> So this is a small code example from our project's QueryBuilder.
>>> QueryBuilder has multiple APIs (multiple interfaces) and each is
>>> implemented by a different Mixin (abstract classes). This is its Hibernate
>>> implementation. We have also a mock one. This is one of the QueryBuilder
>>> API methods, that creates a WHERE clause (field >= value ) for an SQL query:
>>> @Override
>>> @Factory
>>> public <T> Clause ge(String field, T value)
>>> {
>>>    synchronized(selfContainer) {
>>>       ValueBuilder<Clause> builder = selfContainer.newValueBuilder(Clause.class);
>>>       builder.prototype()
>>>              .expression()
>>>              .set(Restrictions.ge(field, value));
>>>       return builder.newInstance();
>>>    }
>>> }
>>> These are called very, very often in the project. After all it is a
>>> query engine. Variable 'selfContainer' is injected as
>>> @Structure
>>> protected Module selfContainer;
>>> When we don't lock the buiilder factory (the module), in the way we do
>>> we get all sorts of race conditions. For example when newInstance() is
>>> called it can fail with a constraint violation exception saying that
>>> expression is not optional. But the call Restrictions.ge() can never return
>>> null. So when one thread comes to call newInstance(), another thread has
>>> already messed up with the builder factory. The builders themselves as you
>>> see are local variables (but they may not be, it depends on how they are
>>> implemented inside their factory)
>>> There are other ways it can fail. For example saying that the builder
>>> can't find a proper fragment with a ge implementation. All these errors are
>>> so absurd, for such simple code that they can only be race conditions. I
>>> will collect as much exception dumps of such errors and send them to you in
>>> a future attachment.
>>> When we synchronize in this fashion, problems go away. But this has two
>>> basic caveats
>>> 1. Performance penalty (obvious)
>>> 2. A schroedinger's cat situation. We don't know if the problem went
>>> away because we synchronize or because concurrency falls to such a degree
>>> that the propability of a race conditions falls dramatically, only to
>>> appear on production machines later on
>>> Best regards
>>> Tasos Parisinos
>>> On Thu, Apr 2, 2015 at 11:03 AM, Niclas Hedhman <niclas@hedhman.org>
>>> wrote:
>>>> The general "rule" is that Factories (i.e. implemented by Module
>>>> nowadays) should be thread safe, Builders are NOT thread-safe, and are
>>>> expected to be created at each use. Are you trying to re-use the Builders?
>>>> If not, i.e. you do newXyzBuilder() on each use, and you are seeing
>>>> threading issues, then that is bug(s) and I would love to get hold of the
>>>> details.
>>>> ValueComposites -> thread-safe by definition, once created.
>>>> EntityComposites -> MUST NOT be handed between threads, and is therefor
>>>> indirectly thread-safe.
>>>> TransitentComposites -> Internals are expected to be thread-safe, but
>>>> changes at 'user level' needs to be taken care of.
>>>> ServiceComposites -> Internals are expected to be thread-safe, but user
>>>> level might need care.
>>>> ConfigurationComposites -> They are entities, and therefor inherits
>>>> concurrency characteristics.
>>>> Qi4j isn't really intended for being a speed demon, so 15000 tx/sec
>>>> sounds a bit too ambitious to me. Please report back what kind of numbers
>>>> you will eventually manage, even if it is not good enough for you.
>>>> Niclas
>>>> P.S. Qi4j has just been accepted into the Apache Software Foundation,
>>>> and will emerge as Apache Zest. dev@zest.apache.org is CC'd for that
>>>> reason.
>>>> On Wed, Apr 1, 2015 at 10:50 PM, Tasos Parisinos <
>>>> tasosp@projectbeagle.com> wrote:
>>>>> Thanx for you reply Kent
>>>>> I agree with you that builder instances should be created used and
>>>>> discarded inside a single request (a single thread from the servlet
>>>>> container pool). The builder factories though, as the application itself
>>>>> should be used commonly across all request threads (in a synchronized
>>>>> manner) in order to avoid instantiating such an application PER thread,
>>>>> this will greatly compromise performance. The use of putIfAbsent in that
>>>>> context seems to be corrent. I'll give it a try and update you with results
>>>>> On Wednesday, April 1, 2015 at 10:26:16 PM UTC+3, kent.soelvsten wrote:
>>>>>>  I am not an expert so it might be the blind leading the deaf ......
>>>>>> but i sense a potential problem with concurrent access to various
>>>>>> variants of ValueBuilderFactory#newValueBuilder and
>>>>>> TransientBuilderFactory#newTransientBuilder.
>>>>>> (the internal usage of ConcurrentHashMap inside TypeLookup -
>>>>>> shouldn't we use putIfAbsent?).
>>>>>> So that would be good candidates for synchronization. If that solves
>>>>>> your problem i believe you might have found a bug - and a work-around.
>>>>>> ValueBuilder and TransientBuilder instances should probably be
>>>>>> created, used and discarded inside a single web request and not reused.
>>>>>> /Kent
>>>>>> Den 01-04-2015 kl. 20:07 skrev Tasos Parisinos:
>>>>>> Hi all
>>>>>>  Let me describe my problem. We have implemented a servlet (deployed
>>>>>> in tomcat) that takes a REST request and based on its query parameters,
>>>>>> builds and executes a single query (using Hibernate ORM) within a
>>>>>> transaction (using Atomikos). The application specifics are not important,
>>>>>> what is important is that we need high throughput (15.000 trx / sec
is our
>>>>>> objective).
>>>>>>  We have implemented all infrastructure code using Qi4j for COP and
>>>>>> DI as well as Property<T> data validation (constraint annotations).
>>>>>> deployment time (in a separate thread) we assemble and activate two
>>>>>> runtimes, each with a Qi4j application. The first is used only during
>>>>>> deployment, while the second is used in ALL  threads that serve requests.
>>>>>> Using Qi4j this second application, starts various ServiceComposite
>>>>>> the servlet deployes, for eager initialization (logger service, mapping
>>>>>> service, repository service, rest service, application services,
>>>>>> services, transaction service, token service to name only some).
>>>>>> implement our Use Cases with a DCI design.
>>>>>>  These services and DCI code uses various ValueBuilder<T> and
>>>>>> TransientBuilder<T> to do composition.
>>>>>>  The problem is:
>>>>>>  Because ALL request threads, use the same Qi4j application, we have
>>>>>> various race conditions that are mainly associated with the various
>>>>>> builders. These race conditions appear when the servlet serves more
>>>>>> 2000 trx / sec. Sacrificing some throughput we can synchronize shared
>>>>>> variables, but to minimize performance impact we need to know:
>>>>>>  1. What is the best practice for such cases
>>>>>> 2. Which part of ValueBuilderFactory, ValueBuilder<T>,
>>>>>> TransientBuilderFactory, TransientBuilder<T> is best to synchronize?
>>>>>>  Thanx in advance
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "qi4j-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to qi4j-dev+u...@googlegroups.com.
>>>>>> To post to this group, send email to qi4j...@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>   --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "qi4j-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to qi4j-dev+unsubscribe@googlegroups.com.
>>>>> To post to this group, send email to qi4j-dev@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/qi4j-dev.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> --
>>>> Niclas Hedhman, Software Developer
>>>> http://www.qi4j.org - New Energy for Java
> --
> Niclas Hedhman, Software Developer
> http://www.qi4j.org - New Energy for Java

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message