hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricky Ho <...@adobe.com>
Subject RE: Finding longest path in a graph
Date Mon, 02 Feb 2009 06:33:28 GMT
I heard about Mahout address the machine learning portion that but haven't look at any detail
yet.

I am looking more from the theory side (how to transform the algorithm into the Map/Reduce
form) and not necessary an implementation.

Rgds,
Ricky
-----Original Message-----
From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com] 
Sent: Sunday, February 01, 2009 10:29 PM
To: core-user@hadoop.apache.org
Subject: Re: Finding longest path in a graph

Ricky,
Are you aware of Mahout project?
http://lucene.apache.org/mahout/

I think this project tries to addess some of the algorithms you have
mentioned.

Regards,
Lukas

2009/2/2 Ricky Ho <rho@adobe.com>

> Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> algorithm into Map/Reduce is not simple.
>
> I wonder if there is any effort from the academia to look into build a
> catalog of how each of our familiar algorithms can be transformed into
> Map/Reduce, such as ...
> 1) Sorting
> 2) Searching (index tree, hash, geo/spacial search)
> 3) Network/Graph processing (min spanning tree, shortest path, network
> diameter)
> 4) Computational Geometry (Ray tracing, Convex Hull)
> 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> 6) Machine learning (logistic regression, nearest neighbor, cluster,
> Bayesian classification, decision tree, neural network)
>
> It is also important to recognize there are other models in our tool box
> (besides map/reduce) for parallelizing an algorithm, such as ...
> a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> b) Dependency graph / Data flow programming (e.g. Dryad)
> c) MPI
> d) Multi-thread / Shared memory model
> e) ... anything else ...
>
> Rgds,
> Ricky
>
> -----Original Message-----
> From: Lukáš Vlček [mailto:lukas.vlcek@gmail.com]
> Sent: Sunday, February 01, 2009 11:37 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Finding longest path in a graph
>
> Hi,
> just my 2 cents.
> You are right that MapReduce does not fit to all problems. Expecially, when
> it comes to processing graphs. On the other hand I think even in Google
> they
> stick with MapReduce for complex graph processing (and they do it in Yahoo!
> as well). I had a chance to talk to one Google engineer and I asked him
> exactly this question ("Do you use MapReduce even if it is known that
> specific problems can be solved with different architecture in a more
> efficient way?"). And the answer was "yes". I don't know if Google
> engineers
> are using different architectures - and I would be surprised if not - but
> they probably don't use it at the same scale as they do with MapReduce.
> There are several good reasons for this: MapReduce is easy to learn (which
> means that even fresh interns can use it very quickly - also in not
> efficient way), they probably have very nice visual tools for managing
> MapReduce clusters and finally they have a lot of HW resources.
>
> BTW: Andrzej, do you consider contributing your graph processing utilis
> into
> Hadoop or Mahout?
>
> Regards,
> Lukas
>
> On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <markkerzner@gmail.com>
> wrote:
>
> > Andrzej,
> > without deeper understanding of exactly what you are doing, I have a gut
> > feeling that a different distributed system might be a better fit for
> this
> > specific task. I assume, you are dealing with very large graphs if you
> are
> > using Hadoop, and you want grid processing. But the linear nature of
> > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> every
> > task can be easily expressed this way.
> >
> > The other technology I mean is JavaSpaces, of which I usually use the
> > GigaSpaces implementation. This allows more complex algorithms. You will
> > store your complete graph as an appropriate structure in a JavaSpace, and
> > will also restructure it for parallel processing, as outlined in some
> > JavaSpaces books. Then you can have as many workers as you want, working
> on
> > individual nodes.
> >
> > Mark
> >
> > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <ab@getopt.org>
> wrote:
> >
> > > Hi,
> > >
> > > I'm looking for an advice. I need to process a directed graph encoded
> as
> > a
> > > list of <from, to> pairs. The goal is to compute a list of longest
> paths
> > in
> > > the graph. There is no guarantee that the graph is acyclic, so there
> > should
> > > be some mechanism to detect cycles.
> > >
> > > Currently I'm using a simple approach consisting of the following: I
> > encode
> > > the graph as <vertex, <neighbor, direction, distance>>, and extending
> the
> > > paths by one degree at a time. This means that in order to find the
> > longest
> > > path of degree N it takes N + 1 map-reduce jobs.
> > >
> > > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > > pointers.
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki     <><
> > >  ___. ___ ___ ___ _ _   __________________________________
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> > >
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/

Mime
View raw message