ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Adams <sbad...@gmail.com>
Subject Re: Large Number of Compute Grid Job
Date Tue, 25 Aug 2015 14:59:59 GMT
The code is:

  try (Ignite ignite = Ignition.start(igniteConfigPath)) {
Date runDate = new Date(); ClusterGroup remotes =
ignite.cluster().forRemotes(); List<SimulationJob> jobs =
IntStream.range(0, totalRuns).sequential().mapToObj( (i) -> new
SimulationJob(simulationClass, i)).collect(Collectors.toList());
Collection<SimulationHarness<?,?,?>> result =
ignite.compute(remotes).call(jobs);
SimulationHarness<?,?,?> combined = createHarness(simulationClass, take,
totalRuns).init(); result.stream().sequential().forEach(harness ->
combined.combine(harness)); writeResults(runDate, totalRuns, combined); }

class SimulationJob implements IgniteCallable<SimulationHarness<?,?,?>> {
...
@Override public SimulationHarness<?,?,?> call() throws Exception {
SimulationHarness<?,?,?> harness = createHarness(simulationClass, props,
runs); harness.perform(runOffset); return harness; } }

I can see from VIsualVM that at times 20% of CPU is used for GC but I don't
know how to test for pauses. There's nothing obvious, the debug in the
console seems pretty responsive.

Thanks for the tip about the ComputeContinuousMapper, I'll look into it.

Sam

On 25 August 2015 at 12:25, Yakov Zhdanov <yzhdanov@gridgain.com> wrote:

> Sam, can you please share the code?
>
> I want to get to the bottom of peer class loading issue. Can you please
> make sure that node you send jobs from does not experience lengthy GC
> pauses?
>
> Having too many jobs may be an issue from heap utilization standpoint.
>
> I would recommend using Continuous Mapper. Example is under Ignite sources
> - ComputeContinuousMapperExample
>
> --
> Yakov Zhdanov, Director R&D
> *GridGain Systems*
> www.gridgain.com
>
> 2015-08-25 13:50 GMT+03:00 Sam Adams <sbadams@gmail.com>:
>
>> Hi,
>>
>> I am looking to run a pretty simple compute job. However, it is going to
>> take a large number of jobs (>10k). Each job takes about 5 minutes to run.
>> When starting ignite I see in the logs: WARNING: Number of jobs in task is
>> too large for task...
>>
>> What is the issue with having many jobs? Is it related to the result
>> stream? I was expecting the results to be streamed in as they were computed
>> but from what I've seen it they don't seem to be processed until all jobs
>> have finished. Is this correct? I can see that might be an issue with all
>> results having to be kept in memory before they are reduced.
>>
>> I can of course combine the logic in the jobs so that each job runs the
>> logic 100 times for example, that way only 100 jobs would be needed.
>> However if a job fails this means I could lose many hours of computation
>> which is not ideal. Also some nodes may be faster than others so I'd like
>> them to be able to steal jobs efficiently.
>>
>> When I run 10,000 jobs I see remote nodes fail to peer load classes.
>>
>> java.lang.ClassNotFoundException: Failed to peer load class...
>>  org.apache.ignite.IgniteCheckedException: Failed to send class-
>> loading request to node (is node alive?)...
>>
>> and
>>
>> Failed to receive peer response from node within duration
>> [node=7251ca67-e4a0-4f35-9678-644533d8e65d, duration=5034]
>>
>> That node (7251ca67-e4a0-4f35-9678-644533d8e65d) is the node that started
>> ignite and initiated the compute job. I only get this if I am running
>> another node on that machine (I only run the logic on remotes).
>>
>> If I don't run another node I get:
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>> exceeded
>> at java.util.LinkedList.linkLast(LinkedList.java:142)
>> at java.util.LinkedList.add(LinkedList.java:338)
>> at
>> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1052)
>> at
>> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1333)
>> at
>> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1305)
>> at
>> org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:462)
>> at
>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:108)
>> at
>> org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:618)
>> at
>> org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:344)
>> at
>> org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:376)
>> at
>> org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:349)
>> at
>> org.apache.ignite.internal.IgniteComputeImpl.call(IgniteComputeImpl.java:349)
>>
>> Can you suggest what might be causing this exception?
>>
>> I have a feeling that your solutions to all my issues might be to combine
>> more logic into less jobs but perhaps you can suggest something better?
>>
>> Thanks,
>>
>> Sam
>>
>
>

Mime
View raw message