ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yakov Zhdanov <yzhda...@gridgain.com>
Subject Re: Large Number of Compute Grid Job
Date Tue, 25 Aug 2015 11:25:07 GMT
Sam, can you please share the code?

I want to get to the bottom of peer class loading issue. Can you please
make sure that node you send jobs from does not experience lengthy GC
pauses?

Having too many jobs may be an issue from heap utilization standpoint.

I would recommend using Continuous Mapper. Example is under Ignite sources
- ComputeContinuousMapperExample

--
Yakov Zhdanov, Director R&D
*GridGain Systems*
www.gridgain.com

2015-08-25 13:50 GMT+03:00 Sam Adams <sbadams@gmail.com>:

> Hi,
>
> I am looking to run a pretty simple compute job. However, it is going to
> take a large number of jobs (>10k). Each job takes about 5 minutes to run.
> When starting ignite I see in the logs: WARNING: Number of jobs in task is
> too large for task...
>
> What is the issue with having many jobs? Is it related to the result
> stream? I was expecting the results to be streamed in as they were computed
> but from what I've seen it they don't seem to be processed until all jobs
> have finished. Is this correct? I can see that might be an issue with all
> results having to be kept in memory before they are reduced.
>
> I can of course combine the logic in the jobs so that each job runs the
> logic 100 times for example, that way only 100 jobs would be needed.
> However if a job fails this means I could lose many hours of computation
> which is not ideal. Also some nodes may be faster than others so I'd like
> them to be able to steal jobs efficiently.
>
> When I run 10,000 jobs I see remote nodes fail to peer load classes.
>
> java.lang.ClassNotFoundException: Failed to peer load class...
>  org.apache.ignite.IgniteCheckedException: Failed to send class-
> loading request to node (is node alive?)...
>
> and
>
> Failed to receive peer response from node within duration
> [node=7251ca67-e4a0-4f35-9678-644533d8e65d, duration=5034]
>
> That node (7251ca67-e4a0-4f35-9678-644533d8e65d) is the node that started
> ignite and initiated the compute job. I only get this if I am running
> another node on that machine (I only run the logic on remotes).
>
> If I don't run another node I get:
>
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
> exceeded
> at java.util.LinkedList.linkLast(LinkedList.java:142)
> at java.util.LinkedList.add(LinkedList.java:338)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1052)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1333)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1305)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:462)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:108)
> at
> org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:618)
> at
> org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:344)
> at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:376)
> at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:349)
> at
> org.apache.ignite.internal.IgniteComputeImpl.call(IgniteComputeImpl.java:349)
>
> Can you suggest what might be causing this exception?
>
> I have a feeling that your solutions to all my issues might be to combine
> more logic into less jobs but perhaps you can suggest something better?
>
> Thanks,
>
> Sam
>

Mime
View raw message