hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukáš Vlček <lukas.vl...@gmail.com>
Subject Re: Finding longest path in a graph
Date Sun, 01 Feb 2009 19:37:25 GMT
just my 2 cents.
You are right that MapReduce does not fit to all problems. Expecially, when
it comes to processing graphs. On the other hand I think even in Google they
stick with MapReduce for complex graph processing (and they do it in Yahoo!
as well). I had a chance to talk to one Google engineer and I asked him
exactly this question ("Do you use MapReduce even if it is known that
specific problems can be solved with different architecture in a more
efficient way?"). And the answer was "yes". I don't know if Google engineers
are using different architectures - and I would be surprised if not - but
they probably don't use it at the same scale as they do with MapReduce.
There are several good reasons for this: MapReduce is easy to learn (which
means that even fresh interns can use it very quickly - also in not
efficient way), they probably have very nice visual tools for managing
MapReduce clusters and finally they have a lot of HW resources.

BTW: Andrzej, do you consider contributing your graph processing utilis into
Hadoop or Mahout?


On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <markkerzner@gmail.com> wrote:

> Andrzej,
> without deeper understanding of exactly what you are doing, I have a gut
> feeling that a different distributed system might be a better fit for this
> specific task. I assume, you are dealing with very large graphs if you are
> using Hadoop, and you want grid processing. But the linear nature of
> Map/Reduce may make it hard to fit. As the MapReduce paper said, not every
> task can be easily expressed this way.
> The other technology I mean is JavaSpaces, of which I usually use the
> GigaSpaces implementation. This allows more complex algorithms. You will
> store your complete graph as an appropriate structure in a JavaSpace, and
> will also restructure it for parallel processing, as outlined in some
> JavaSpaces books. Then you can have as many workers as you want, working on
> individual nodes.
> Mark
> On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab@getopt.org> wrote:
> > Hi,
> >
> > I'm looking for an advice. I need to process a directed graph encoded as
> a
> > list of <from, to> pairs. The goal is to compute a list of longest paths
> in
> > the graph. There is no guarantee that the graph is acyclic, so there
> should
> > be some mechanism to detect cycles.
> >
> > Currently I'm using a simple approach consisting of the following: I
> encode
> > the graph as <vertex, <neighbor, direction, distance>>, and extending
> > paths by one degree at a time. This means that in order to find the
> longest
> > path of degree N it takes N + 1 map-reduce jobs.
> >
> > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > pointers.
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message