mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1604) Create a RowSimilarity for Spark
Date Wed, 17 Dec 2014 16:36:13 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250086#comment-14250086
] 

Dmitriy Lyubimov commented on MAHOUT-1604:
------------------------------------------

Relevant sequential history of this file follows. History doesn't lie. My commit removed it
and your commit brought it back right after with comment "do not remove albeit useless" added.


If you don't recognize the change, then it is failure to correctly merge in the head that
resulted in overwritten changes. 
Which is in any build system is a "mortal sin" and is subject to immediate revert. It means
there are potentially more overwriting changes in this commit. 

In light of which, the proposed remedy is as follows: 
- a revert is issued for commit 149c98592
- the committer of PR #47 works on commit as follows: 
-- pull apache/master HEAD to PR branch, resolve merge 
-- issue git diff apache/master
--- if there are unrecognized changes in that commit, the committer works on the PR until
it contains no unrecognized/master-overwriting changes. 
--- we do one more round of PR review after that. 

relevant sequential file history with diff follows.

{panel:title=log}

dmitriy@Intel-KUBU:~/projects/github/mahout-commits$ git log 149c98592fe -p -2 -- spark/pom.xml
commit 149c98592fe447c98dfb5afc67b5809725cc3056
Author: pferrel <pat@occamsmachete.com>
Date:   Thu Aug 28 10:45:13 2014 -0700

    MAHOUT-1604 add a CLI and associated code for spark-rowsimilarity, also cleans up some
things in MAHOUT-1568 and MAHOUT-1569, closes apache/mahout#47

diff --git a/spark/pom.xml b/spark/pom.xml
index 71d3944..2f79377 100644
--- a/spark/pom.xml
+++ b/spark/pom.xml
@@ -157,6 +157,27 @@
         </executions>
       </plugin>
 
+      <!-- create job jar to include CLI driver deps-->
+      <!-- leave this in even though there are no hadoop mapreduce jobs in this module
-->
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-assembly-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>job</id>
+            <phase>package</phase>
+            <goals>
+              <goal>single</goal>
+            </goals>
+            <configuration>
+              <descriptors>
+                <descriptor>src/main/assembly/job.xml</descriptor>
+              </descriptors>
+            </configuration>
+          </execution>
+        </executions>
+      </plugin>
+
     </plugins>
   </build>
 

commit c6ee8cbcdb6ae205624b908bc16ae462515c98e6
Author: Dmitriy Lyubimov <dlyubimov@apache.org>
Date:   Fri Aug 15 16:31:10 2014 -0700

    (NOJIRA) disabling -job.jar assembly in spark module (we don't use it, do we?)

diff --git a/spark/pom.xml b/spark/pom.xml
index 0946cee..71d3944 100644
--- a/spark/pom.xml
+++ b/spark/pom.xml
@@ -83,27 +83,6 @@
         </executions>
       </plugin>
 
-      <!-- create core job dependencies jar -->
-
-      <plugin>
-        <groupId>org.apache.maven.plugins</groupId>
-        <artifactId>maven-assembly-plugin</artifactId>
-          <executions>
-            <execution>
-              <id>job</id>
-              <phase>package</phase>
-              <goals>
-                <goal>single</goal>
-              </goals>
-              <configuration>
-                <descriptors>
-                  <descriptor>src/main/assembly/job.xml</descriptor>
-                </descriptors>
-              </configuration>
-            </execution>
-          </executions>
-      </plugin>
-
       <!-- create test jar so other modules can reuse the math test utility classes. -->
       <plugin>
         <groupId>org.apache.maven.plugins</groupId>
dmitriy@Intel-KUBU:~/projects/github/mahout-commits$ 
{panel}


> Create a RowSimilarity for Spark
> --------------------------------
>
>                 Key: MAHOUT-1604
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1604
>             Project: Mahout
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 1.0
>         Environment: Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>
> Using CooccurrenceAnalysis.cooccurrence create a driver that reads a text DRM or two
and produces LLR similarity/cross-similarity matrices.
> This will produce the same results as ItemSimilarity but take a Drm as input instead
of individual cells.
> The first version will only support LLR, other similarity measures will need to be in
separate Jiras



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message