giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Kabiljo <ikabi...@gmail.com>
Subject Introducing Blocks Framework for Giraph
Date Wed, 03 Jun 2015 02:25:13 GMT
Heads up, as I will be submitting Blocks Framework to Giraph, so in the
next few days/weeks you will see large number of JIRAs, diffs and new files
related to it.

Blocks Framework is a new API for writing applications, that tries to
address many deficiencies of current Giraph API, while keeping all of its
functionality and performance. Execution is still the same, and Blocks
Framework API translates internally to Giraph API itself.

As we were developing more complex applications, we were finding that it
was very hard to encapsulate parts of the application, and to compose
multiple parts, within current API. And that is important throughout, from
extending and managing complex application, to being able to build
libraries.
For example, let's say we want to write a simple utility that will compute
total edge weight of the graph, and log it on master. First there is no way
to encapsulate that logic - we would have to have some code on the master
(to register aggregator, and to log the result), and we would have to have
a computation that will aggregate values for each vertex. Since
MasterCompute cannot be changed, at best, we could have two functions:
masterComputeStep1 to register aggregator and change computation, and
masterComputeStep2 to log the result. At minimum, we are leaking how many
supersteps this computation takes, computation would need to match message
types before and after it. Now let's say we have two such utilities, and we
want to compose them - there is no way to do so. If we want to modify
existing application to do this at the beginning, before it's own logic -
there is no easy way to do so.

Blocks Framework is primarily trying to solve that problem - by having all
code being written in "Blocks" (as block of code in programming). You can
compose multiple blocks together, to get a more complex logic (for example
new SequenceBlock(block1, block2), or new RepeatBlock(10, iterBlock)). Each
application is represented by a single complex Block. Each block is an
iterator over individual steps - called Pieces (each Piece is itself a
Block). Each Piece encapsulates one message sending, and master compute in
between (so "send" half of one superstep, master compute and "receive" half
of the next superstep).

There is of course much more to it, and a lot of details, which you will be
able to see from diffs and JIRAs.

If any of these bullet points resonate to being painful with you, you are
going to enjoy new framework:
- checking getSuperstep to decide which logic to execute throughout the
code, or keep track of where you are
- changing computation, trying to match computation message types
- having multiple consecutive Giraph jobs, because it is too complex to
have it all in one
- having multiple concrete computation classes, just to support different
vertex/edges types
- wanting to combine multiple algorithms inside single Giraph job

We have developed this framework within Facebook within the last year or
so, and it is mature enough by now to be shared with the community.

Thanks,
Igor

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message