mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1636) Class dependencies for the spark module are put in a job.jar, which is very inefficient
Date Tue, 23 Dec 2014 19:28:13 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257399#comment-14257399
] 

Dmitriy Lyubimov commented on MAHOUT-1636:
------------------------------------------

interesting. spark classpath already includes /lib/*.jar. but /lib is never actually built.


also since it is really currently used by front end only, and modern jvm supports classpath
wildcards, it doesn't care to elaborate on content of hypothetical /lib folder. 

which gives me an idea. maybe non-spark dependencies should live in the /lib folder.

and the CLI drivers need to scavenge the output of 'mahout -spark classpath` for particular
additional jars. or simply look in $MAHOUT_HOME/lib/ directly.


> Class dependencies for the spark module are put in a job.jar, which is very inefficient
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1636
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1636
>             Project: Mahout
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.0-snapshot
>            Reporter: Pat Ferrel
>             Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all dependencies
including transitive ones. This job.jar is in mahout/spark/target and is included in the classpath
when a Spark job is run. This allows dependency classes to be found at runtime but the job.jar
include a great deal of things not needed that are duplicates of classes found in the main
mrlegacy job.jar.  If the job.jar is removed, drivers will not find needed classes. A better
way needs to be implemented for including class dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for now. Whoever
picks up this Jira will have to remove it after deciding on a better method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message