ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ihorps <ihor...@gmail.com>
Subject Re: Task management - MapReduce & ForkJoin performance penalty
Date Wed, 30 Aug 2017 21:02:30 GMT
ezhuravlev wrote
> Also, maybe it's better to compare your current solution with Ignite on
> some real tasks? Or at least more approximate to the real use case
> Evgenii

Hi @ezhuravlev
Thank you for your replay!
I'm preparing more "fair" comparison with our custom made solution but it
can't be done in a simple way due to different technical solutions at the
end (I'll try to explain it below). So it requires to migrate some basic
functionality to have relatively objective comparison (hope I'll do this

ezhuravlev wrote
> I don't really understand, what you've tried to measure here? 
> .....
> Maybe you could describe your case in detail, so we could suggest you a
> better solution?

I'll try to explain what is the goal of such simple measurement. But it will
require additional introduction to the current situation - why and how we
have come there. It won't be short but I'll try to be concise.

[Goal of simple measurement]
Try to evaluate an overhead of job management in a distributed manner.
Literally - what time is spent for job management in a cluster of ignite
computegrid framework. Idea was:
1. having 16 core (hyper-thead, 8 physical cores) machine I could make an
assumption that it can execute in a parallel at least 8x2 jobs (not sure -
just an assmaption)
2. create 1 task with empty jobs and count on that a crucial majority of
time will be spent in task/jobs management itself

As I understood from your feedback it's better to try it on different
physical hosts - I'll check it out tomorrow.

[why we need it]
Short introduction why I do this evaluation (probably not relevant at the
end but let's see).
Skipping all not relevant (yet) info I'll start describing: few years ago
there was a decision made - we need in-memory compute engine. Distributed
one. So we could speed up... yes, SQL calculation. So we got this:
1. Create an abstraction for distributed compute grid
2. Hazelcast was chosen as a basis
3. Such as computation was, by its nature, kind of a map-reduce (calculation
of reduced values on different levels, starting from store --> warehouse -->
country) we chose Hazelcast's MapReduce API
4. We had only few Tasks with few Jobs 
5. Jobs were running few minutes due to slow data load

Everything was ok as long as we got new type of tasks. Jobs in these tasks
where loading little from DB and were more CPU aggressive and amount of such
jobs was increased up to 1000. We revealed that overhead for management one
job in a cluster was > 2-3 sec., which was unacceptable (Hazelcast admitted
that MapReduce framework was buggy and not performant at that point of time)

As a rescue we decided to write our custom solution:
1. Introduce Task abstraction
2. Introduce Job abstraction as a type for sub-tasks parallelism.
3. Keep Task management in a distributed map (transaction, state [fail,
done, executing]). Distributed map is backed by persistence storage in
relational DB
4. Keep Job management in a distributed map (transnational, state [fail,
done, executing], collocated run, job steeling and more). Jobs are kept in
memory only.
5. Job has no return type (like Runnable)
6. More may come here... left out due to further simplification 

After some time I realized that our custom solution becomes really heavy for
maintenance/support and seems to be that we are reinventing the wheel. I've
figured out that Apache Ignite does quite similar stuff and much more
(persistence - this is what we wanted to implement 1,5 year ago). 

So my goal is before switching the project (which is quite big - 24
countries coverage) to see if I not face the same problem with job
management overhead from the MapReduce API in Ignite. And if it works we
would glad to use another very handy features like Off-heap memory, Ignite
Persistence, SQL engine and Data Streaming.

I hope it will help to understand what I'm going to achieve here.


Sent from: http://apache-ignite-users.70518.x6.nabble.com/

View raw message