hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-941) Semiclustering Termination
Date Sat, 30 Apr 2016 22:23:12 GMT

    [ https://issues.apache.org/jira/browse/HAMA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265532#comment-15265532
] 

Edward J. Yoon commented on HAMA-941:
-------------------------------------

First of all, it looks like boundary score factor seems always 0.0. This is the user-defined
parameter. 2nd, if vertex count is (vC <= 1), score should be 1.0. Please apply my patch
and test again. Do you see more bugs? 

{code}
diff --git a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
index 9a905c1..38481fd 100644
--- a/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
+++ b/ml/src/main/java/org/apache/hama/ml/semiclustering/SemiClusteringVertex.java
@@ -71,7 +71,7 @@
         candidates.add(msg);
 
         if (!msg.contains(this.getVertexID())
-            && msg.size() == semiClusterMaximumVertexCount) {
+            && msg.size() < semiClusterMaximumVertexCount) {
           SemiClusterMessage msgNew = WritableUtils.clone(msg, this.getConf());
           msgNew.addVertex(this);
           msgNew.setSemiClusterId("C"
@@ -149,14 +149,15 @@
    * @return the value to calcualte the Score of a semi-cluster.
    */
   public double semiClusterScoreCalcuation(SemiClusterMessage message) {
-    double iC = 0.0, bC = 0.0, fB = 0.0, sC = 0.0;
-    int vC = 0, eC = 0;
+    // TODO fB is the bounday score factor. This should be configurable by user
+    // the default is 0.5
+    double iC = 0.0, bC = 0.0, fB = 0.5, sC = 0.0;
+    int vC = 0;
     vC = message.size();
     for (Vertex<Text, DoubleWritable, SemiClusterMessage> v : message
         .getVertexList()) {
       List<Edge<Text, DoubleWritable>> eL = v.getEdges();
       for (Edge<Text, DoubleWritable> e : eL) {
-        eC++;
         if (message.contains(e.getDestinationVertexID())
             && e.getValue() != null) {
           iC = iC + e.getValue().get();
@@ -165,8 +166,12 @@
         }
       }
     }
+
     if (vC > 1)
-      sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2)) / eC;
+      sC = ((iC - fB * bC) / ((vC * (vC - 1)) / 2));
+    else
+      sC = 1.0;
+
     return sC;
   }
{code}

> Semiclustering Termination
> --------------------------
>
>                 Key: HAMA-941
>                 URL: https://issues.apache.org/jira/browse/HAMA-941
>             Project: Hama
>          Issue Type: Improvement
>          Components: examples, graph
>            Reporter: Edward J. Yoon
>            Priority: Minor
>
> Currently Semiclustering example will be terminated when the number of iterations exceeded
the predefined threshold max iteration.
> App should be stopped if there's no cluster changes (I guess). Please check and improve
it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message