spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Building Spark packages with SBTor Maven
Date Tue, 15 Mar 2016 13:10:45 GMT
sounds like the layout is basically the same as sbt layout, the sbt file is
replaced by pom.xml?



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 March 2016 at 13:06, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Thanks again
>
> Is there anyway one can set this one up without eclipse much like what I
> did with sbt?
>
> I need to know the directory structure foe MVN project.
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 March 2016 at 12:38, Chandeep Singh <cs@chandeep.com> wrote:
>
>> Do you have the Eclipse Maven plugin setup? http://www.eclipse.org/m2e/
>>
>> Once you have it setup, File -> New -> Other -> MavenProject -> Next
/
>> Finish. You’ll see a default POM.xml which you can modify / replace.
>>
>>
>> Here is some documentation that should help:
>> http://scala-ide.org/docs/tutorials/m2eclipse/
>>
>> I’m using the same Eclipse build as you on my Mac. I mostly build a
>> shaded JAR and SCP it to the cluster.
>>
>> On Mar 15, 2016, at 12:22 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>> wrote:
>>
>> Great Chandeep. I also have Eclipse Scala IDE below
>>
>> scala IDE build of Eclipse SDK
>> Build id: 4.3.0-vfinal-2015-12-01T15:55:22Z-Typesafe
>>
>> I am no expert on Eclipse so if I create project called ImportCSV where
>> do I need to put the pom file or how do I reference it please. My Eclipse
>> runs on a Linux host so it cab access all the directories that sbt project
>> accesses? I also believe there will not be any need for external jar files
>> in builkd path?
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 15 March 2016 at 12:15, Chandeep Singh <cs@chandeep.com> wrote:
>>
>>> Btw, just to add to the confusion ;) I use Maven as well since I moved
>>> from Java to Scala but everyone I talk to has been recommending SBT for
>>> Scala.
>>>
>>> I use the Eclipse Scala IDE to build. http://scala-ide.org/
>>>
>>> Here is my sample PoM. You can add dependancies based on your
>>> requirement.
>>>
>>> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
>>> http://www.w3.org/2001/XMLSchema-instance"
>>> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>>> http://maven.apache.org/maven-v4_0_0.xsd">
>>> <modelVersion>4.0.0</modelVersion>
>>> <groupId>spark</groupId>
>>> <version>1.0</version>
>>> <name>${project.artifactId}</name>
>>>
>>> <properties>
>>> <maven.compiler.source>1.7</maven.compiler.source>
>>> <maven.compiler.target>1.7</maven.compiler.target>
>>> <encoding>UTF-8</encoding>
>>> <scala.version>2.10.4</scala.version>
>>> <maven-scala-plugin.version>2.15.2</maven-scala-plugin.version>
>>> </properties>
>>>
>>> <repositories>
>>> <repository>
>>> <id>cloudera-repo-releases</id>
>>> <url>https://repository.cloudera.com/artifactory/repo/</url>
>>> </repository>
>>> </repositories>
>>>
>>> <dependencies>
>>> <dependency>
>>> <groupId>org.scala-lang</groupId>
>>> <artifactId>scala-library</artifactId>
>>> <version>${scala.version}</version>
>>> </dependency>
>>> <dependency>
>>> <groupId>org.apache.spark</groupId>
>>> <artifactId>spark-core_2.10</artifactId>
>>> <version>1.5.0-cdh5.5.1</version>
>>> </dependency>
>>> <dependency>
>>> <groupId>org.apache.spark</groupId>
>>> <artifactId>spark-mllib_2.10</artifactId>
>>> <version>1.5.0-cdh5.5.1</version>
>>> </dependency>
>>> <dependency>
>>> <groupId>org.apache.spark</groupId>
>>> <artifactId>spark-hive_2.10</artifactId>
>>> <version>1.5.0</version>
>>> </dependency>
>>>
>>> </dependencies>
>>> <build>
>>> <sourceDirectory>src/main/scala</sourceDirectory>
>>> <testSourceDirectory>src/test/scala</testSourceDirectory>
>>> <plugins>
>>> <plugin>
>>> <groupId>org.scala-tools</groupId>
>>> <artifactId>maven-scala-plugin</artifactId>
>>> <version>${maven-scala-plugin.version}</version>
>>> <executions>
>>> <execution>
>>> <goals>
>>> <goal>compile</goal>
>>> <goal>testCompile</goal>
>>> </goals>
>>> </execution>
>>> </executions>
>>> <configuration>
>>> <jvmArgs>
>>> <jvmArg>-Xms64m</jvmArg>
>>> <jvmArg>-Xmx1024m</jvmArg>
>>> </jvmArgs>
>>> </configuration>
>>> </plugin>
>>> <plugin>
>>> <groupId>org.apache.maven.plugins</groupId>
>>> <artifactId>maven-shade-plugin</artifactId>
>>> <version>1.6</version>
>>> <executions>
>>> <execution>
>>> <phase>package</phase>
>>> <goals>
>>> <goal>shade</goal>
>>> </goals>
>>> <configuration>
>>> <filters>
>>> <filter>
>>> <artifact>*:*</artifact>
>>> <excludes>
>>> <exclude>META-INF/*.SF</exclude>
>>> <exclude>META-INF/*.DSA</exclude>
>>> <exclude>META-INF/*.RSA</exclude>
>>> </excludes>
>>> </filter>
>>> </filters>
>>> <transformers>
>>> <transformer
>>>
>>> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
>>> <mainClass>com.group.id.Launcher1</mainClass>
>>> </transformer>
>>> </transformers>
>>> </configuration>
>>> </execution>
>>> </executions>
>>> </plugin>
>>> </plugins>
>>> </build>
>>>
>>> <artifactId>scala</artifactId>
>>> </project>
>>>
>>>
>>> On Mar 15, 2016, at 12:09 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>>> wrote:
>>>
>>> Ok.
>>>
>>> Sounds like opinion is divided :)
>>>
>>> I will try to build a scala app with Maven.
>>>
>>> When I build with SBT I follow this directory structure
>>>
>>> High level directory the package name like
>>>
>>> ImportCSV
>>>
>>> under ImportCSV I have a directory src and the sbt file ImportCSV.sbt
>>>
>>> in directory src I have main and scala subdirectories. My scala file is
>>> in
>>>
>>> ImportCSV/src/main/scala
>>>
>>> called ImportCSV.scala
>>>
>>> I then have a shell script that runs everything under ImportCSV directory
>>>
>>> cat generic.ksh
>>> #!/bin/ksh
>>>
>>> #--------------------------------------------------------------------------------
>>> #
>>> # Procedure:    generic.ksh
>>> #
>>> # Description:  Compiles and run scala app usinbg sbt and spark-submit
>>> #
>>> # Parameters:   none
>>> #
>>>
>>> #--------------------------------------------------------------------------------
>>> # Vers|  Date  | Who | DA | Description
>>>
>>> #-----+--------+-----+----+-----------------------------------------------------
>>> # 1.0 |04/03/15|  MT |    | Initial Version
>>>
>>> #--------------------------------------------------------------------------------
>>> #
>>> function F_USAGE
>>> {
>>>    echo "USAGE: ${1##*/} -A '<Application>'"
>>>    echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
>>>    exit 10
>>> }
>>> #
>>> # Main Section
>>> #
>>> if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
>>>    F_USAGE $0
>>> fi
>>> ## MAP INPUT TO VARIABLES
>>> while getopts A: opt
>>> do
>>>    case $opt in
>>>    (A) APPLICATION="$OPTARG" ;;
>>>    (*) F_USAGE $0 ;;
>>>    esac
>>> done
>>> [[ -z ${APPLICATION} ]] && print "You must specify an application value
>>> " && F_USAGE $0
>>> ENVFILE=/home/hduser/dba/bin/environment.ksh
>>> if [[ -f $ENVFILE ]]
>>> then
>>>         . $ENVFILE
>>>         . ~/spark_1.5.2_bin-hadoop2.6.kshrc
>>> else
>>>         echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
>>>         exit 1
>>> fi
>>> ##FILE_NAME=`basename $0 .ksh`
>>> FILE_NAME=${APPLICATION}
>>> CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
>>> NOW="`date +%Y%m%d_%H%M`"
>>> LOG_FILE=${LOGDIR}/${FILE_NAME}.log
>>> [ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
>>> print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
>>> cd ../${FILE_NAME}
>>> print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
>>> sbt package
>>> print "Submiiting the job" | tee -a ${LOG_FILE}
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>>>                 --class "${FILE_NAME}" \
>>>                 --master spark://50.140.197.217:7077 \
>>>                 --executor-memory=12G \
>>>                 --executor-cores=12 \
>>>                 --num-executors=2 \
>>>                 target/scala-2.10/${CLASS}_2.10-1.0.jar
>>> print `date` ", Finished $0" | tee -a ${LOG_FILE}
>>> exit
>>>
>>>
>>> So to run it for ImportCSV all I need is to do
>>>
>>> ./generic.ksh -A ImportCSV
>>>
>>> Now can anyone kindly give me a rough guideline on directory and
>>> location of pom.xml to make this work using maven?
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 15 March 2016 at 10:50, Sean Owen <sowen@cloudera.com> wrote:
>>>
>>>> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
>>>> Spark build of reference is Maven.
>>>>
>>>> On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh <cs@chandeep.com>
>>>> wrote:
>>>> > For Scala, SBT is recommended.
>>>> >
>>>> > On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com>
>>>> > wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I build my Spark/Scala packages using SBT that works fine. I have
>>>> created
>>>> > generic shell scripts to build and submit it.
>>>> >
>>>> > Yesterday I noticed that some use Maven and Pom for this purpose.
>>>> >
>>>> > Which approach is recommended?
>>>> >
>>>> > Thanks,
>>>> >
>>>> >
>>>> > Dr Mich Talebzadeh
>>>> >
>>>> >
>>>> >
>>>> > LinkedIn
>>>> >
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> >
>>>> >
>>>> >
>>>> > http://talebzadehmich.wordpress.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message