Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/BristolHadoopWorkshop
The comment on the change is:
graphs

Could you run CMSSW under Hadoop? Probably not. Very slow startup/teardown cost, so you
don't want to just run it for one/two events.
 Issue: How to convince physicists to embrace MR? Need to see the benefits, as physicists
don't see/care about the the costs
+ Issue: How to convince physicists to embrace MR? Need to see the benefits, as physicists
don't see/care about the the costs.
+ == Graphs ==
+
+ * [http://www.slideshare.net/steve_l/graphs1848617 Graphs] Paolo Castagna, HP
+
+ This was a talk by Paolo Castagna on graph work under MR, of which PageRank is classic application
+ * graph topology does not change every iteration, so why ship it around every MR?
+ * the graph defines the other jobs you need to communicate with.
+ The graph is a massive data structure which, if you are doing inference work, only grows
in relationships. Steve thinks: You may need some graph model which is shared across servers,
which they can all add to. There is a small problem here: keeping the information current
for 4000 servers, but what if you don't have to, what if you treat updates to the graph as
lazy facts to propagate round?
+
+ Google: pregel. what do you need from a language to describe PageRank in 15 lines?
+
+ MS: dryad does not do graphs.
+
+ Projects
+ * Apache Hamburg proposed by Edward Yang, of Hamas, is it making progress? We need code.
+ * Apache Common Graph. Dead, preMapReduce code.
+
+ === Graph Algorithms ===
+
+ These are what a graph project should start with
+ * Graph Search
+ * Directed/acyclic graphs
+ * Minimum Spanning Tree
+ * Shortest Path
+ * Network Flow
+
+ Paolo claimed that Depth First Search, DFS, doesn't work in this space. if Depth First Search
doesn't work, what to do?
+
+ The current best printed work on Graph over MapReduce is the recent paper [http://www2.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120
Graph Twiddling in a MapReduce World], by Jonathan Cohen.
+
+ * You can't do transitive closure in SQL.
+ * What would an efficient transitive closure algorithm over MR be?
+
+
+ === Discussion ===
+
+ There was discussion on handing big graphs in the system, ones where the graph itself is
very large. Someone need's to take Paolo's PageRankoverMapReduce code and test it on bigger
data sets.
+
+ There was a good point on what is "efficient" in this world: If you can do it in O(n) on
MR, and there is no single computer in the world that can work on a graph of that scale, then
it's efficient.
+
+ The key point here being that yes, something done as a chain of MR jobs on a Hadoop cluster
may seem an inefficient approach, but if there is no other way to store that much data, or
run through it, then graph people will be happy.
+
