Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/BristolHadoopWorkshop
The comment on the change is:
fix bullets

* [http://www.slideshare.net/steve_l/graphs1848617 Graphs] Paolo Castagna, HP
This was a talk by Paolo Castagna on graph work under MR, of which PageRank is classic application
 * graph topology does not change every iteration, so why ship it around every MR?
+ * graph topology does not change every iteration, so why ship it around every MR?
 * the graph defines the other jobs you need to communicate with.
+ * the graph defines the other jobs you need to communicate with.
The graph is a massive data structure which, if you are doing inference work, only grows
in relationships. Steve thinks: You may need some graph model which is shared across servers,
which they can all add to. There is a small problem here: keeping the information current
for 4000 servers, but what if you don't have to, what if you treat updates to the graph as
lazy facts to propagate round?
Google: pregel. what do you need from a language to describe PageRank in 15 lines?
