ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Ripken <r...@rmanet.com>
Subject Re: Concurrent job execution and FifoQueueCollisionSpi.parallelJobsNumber=1
Date Mon, 10 Apr 2017 23:07:00 GMT
The issue seemed weird to me as well. It was not reproducible and so I 
just assumed that something must have gone wrong with the installation.

I had this issue occur in January and it just happened again over the 
weekend.   This was using Ignite 1.5.0.final.

I've verified that all the nodes are configured using 
FifoQueueCollisionSpi with parallelJobsNumber = 1.

The nodes which execute the jobs are configured via xml:
    ... <property name="collisionSpi">
             <bean 
class="org.apache.ignite.spi.collision.fifoqueue.FifoQueueCollisionSpi">
                 <property name="parallelJobsNumber" value="1"/>
             </bean>
         </property> ...

Based on your previous response I believe the collisionSPI on the node 
submitting the task does not matter.  Just in case that node also has 
the SPI configured:
             IgniteConfiguration igniteConfig = new IgniteConfiguration();
             igniteConfig.setMarshaller(new OptimizedMarshaller());
             igniteConfig.setMetricsLogFrequency(3600000);
             FifoQueueCollisionSpi colSpi = new FifoQueueCollisionSpi();
             colSpi.setParallelJobsNumber(1);
             igniteConfig.setCollisionSpi(colSpi);
             ...

On the previous occurrence of this bug I added this code to the job 
execution:
         CollisionSpi collisionSpi = grid.configuration().getCollisionSpi();
         if (collisionSpi instanceof FifoQueueCollisionSpi) {
             FifoQueueCollisionSpi fifo = (FifoQueueCollisionSpi) 
collisionSpi;
             int parallelJobsNumber = fifo.getParallelJobsNumber();
             _logger.info("FifoQueueCollisionSpi used with 
parallelJobsNumber:" + parallelJobsNumber);
         } else {
             _logger.info("CollisionSpi is not FifoQueueCollisionSpi 
but:" + collisionSpi.getClass().getSimpleName());
         }

And in the logs I see:
FifoQueueCollisionSpi used with parallelJobsNumber:1

However I also see three jobs starting on the same node.   The jobs can 
take minutes to hours to complete and unfortunately the jobs have to 
interact with a gui application.  When multiple jobs are executed at the 
same time there are race conditions related to which workspace the gui 
application has open.  Also during the job execution the gui application 
computes some values.  If multiple computes are done at the same time 
the results get mixed up.

Are there known issues with FifoQueueCollisionSpi?  Are there any 
workarounds?
I'm considering adding an atomicinteger counter check in the job 
execution code.  Do you have any suggestions?  I was thinking that if I 
had failover setup it should be safe to fail any jobs that attempt to 
start concurrently.

Lastly, thanks for the hard work on Ignite (and GridGain!).

-Ryan




On 11/7/2016 6:04 PM, vkulichenko wrote:
> Collision SPI is called on the node that executes the job. Having said that,
> what you tell sounds a bit weird. Are you sure other nodes didn't lose the
> config as well?
>
> -Val
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Concurrent-job-execution-and-FifoQueueCollisionSpi-parallelJobsNumber-1-tp8697p8749.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.



Mime
View raw message