spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boromir Widas <>
Subject Handling tree reduction algorithm with Spark in parallel
Date Tue, 30 Sep 2014 21:12:38 GMT
Hello Folks,

I have been trying to implement a tree reduction algorithm recently in
spark but could not find suitable parallel operations. Assuming I have a
general tree like the following -

I have to do the following -
1) Do some computation at each leaf node to get an array of doubles.(This
can be pre computed)
2) For each non leaf node, starting with the root node compute the sum of
these arrays for all child nodes. So to get the array for node B, I need to
get the array for E, which is the sum of G + H.

////////////////////// Start Snippet
case class Node(name: String, children: Array[Node], values: Array[Double])

// read in the tree here

def getSumOfChildren(node: Node) : Array[Double] = {
    if(node.isLeafNode) {
      return node.values
    foreach(child in node.children) {
       // can use an accumulator here
       node.values = (node.values, getSumOfChildren(child))
////////////////////////// End Snippet

Any pointers to how this can be done in parallel to use all cores will be
greatly appreciated.


View raw message