hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Finding longest path in a graph
Date Thu, 29 Jan 2009 22:31:51 GMT
James Cipar wrote:
> When you say that you want to avoid cycles, do you mean that the path 
> cannot pass through the same vertex more than once, or cannot pass 
> through the same edge more than once?

I think I'm interested in paths that don't pass the same vertex more 
than once - see below.

> If you mean that it can't pass through the same vertex, I don't think 
> there is an easy solution.  If there is a Hamiltonian path in the graph 
> , your algorithm will find it, so your algorithm cannot be better than 
> the best Hamiltonian path algorithm.  Finding a Hamiltonian path is 
> NP-complete, so map reduce will probably not help you.  Perhaps your 
> graphs have some extra structure that could be exploited?  There are 
> efficient solutions for DAGs, but you specifically said the graph may 
> have cycles.

I'm working with web graphs. One type of a graph is where vertices are 
URL-s, and edges represent HTTP redirects (or content redirects - 
doesn't matter). I'm interested in resolving the final URL, given a 
starting URL, so a more precise statement is that I don't need to know 
the complete path, only the start and end of a path.

Such graphs are not acyclic - as you are no doubt aware there are pages 
that redirect to themselves, or ones that redirect in cycles, sometimes 
spanning several intermediate pages.

Another type of a graph is the common web graph where vertices are URL-s 
and edges are outlinks. In this case I'm more interested in detecting 
cycles of a given size than in finding the longest path.

> If you mean that it can't pass through the same edge more than once, 
> then it is probably easier, but I can't imagine how all of it would work 
> in map-reduce.  A path that visits each edge exactly once is simple to 
> compute, assuming such a path exists.  The path exists in every 
> connected graph with either 0, or 2 vertices with odd degree.  So maybe 
> you can find connected components, give each vertex a value that is its 
> degree rounded down to the nearest integer, and pick the component with 
> the largest total value (adding 2 to the value if it has at least 2 odd 
> degree vertices).  Then you just have to find the Eulerian path in that 
> component (by first eliminating edges from all but two odd-degree 
> vertices).

I don't think this helps in my situation, as far as I understand you ...

> Good luck,

This doesn't sound too optimistic ;)


Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message