From user-return-1852-archive-asf-public=cust-asf.ponee.io@predictionio.apache.org Sat Mar 10 02:42:58 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 05DEE18064A for ; Sat, 10 Mar 2018 02:42:56 +0100 (CET) Received: (qmail 34980 invoked by uid 500); 10 Mar 2018 01:42:55 -0000 Mailing-List: contact user-help@predictionio.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.apache.org Delivered-To: mailing list user@predictionio.apache.org Received: (qmail 34970 invoked by uid 99); 10 Mar 2018 01:42:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Mar 2018 01:42:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 267C1C0042 for ; Sat, 10 Mar 2018 01:42:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.131 X-Spam-Level: ** X-Spam-Status: No, score=2.131 tagged_above=-999 required=6.31 tests=[AC_DIV_BONANZA=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=salesforce.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UPbwlarraiAj for ; Sat, 10 Mar 2018 01:42:48 +0000 (UTC) Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com [74.125.82.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 3A8475F3CE for ; Sat, 10 Mar 2018 01:42:48 +0000 (UTC) Received: by mail-wm0-f48.google.com with SMTP id w128so7037126wmw.0 for ; Fri, 09 Mar 2018 17:42:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salesforce.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=kgR6fAQJs6W7AZCM2lNQCgySArJtjacKL2nGSYpanWo=; b=Yh/YR/tpnEf4ZKIzi1GjMe0zuTHJTf6eAr4l2MCUtUI8tT97TXN57wzCtjCv16dAZC sxPGf/0xMERSUGIaNgYcnlVJBhwhDils7iZTPbXbnIFb870KDryHq7ndFb6/X9pWeB0O R9cIw9QR7VZ6Jl0SE3HeT/BMleazzQTgvYHzo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=kgR6fAQJs6W7AZCM2lNQCgySArJtjacKL2nGSYpanWo=; b=MGB+039Cpy/lfW3dkE8qT3QOM6QOrHv4cP5xaLVz8sI27QM5GEANCwp60PoOTUQYON 9NIM+zQWSPRJIv9LJHoSIOOu0cnspudOXzlgwAkgIvFtgTEB2RivuxzIxMVaDzt3dMk4 r+pR7oRJsSEOhukKKLIwbDHxozOSnFa6mje6mdsQwU+Z/WJNDTplUbVEzs8lpUVJnBV0 n3rOfREWjS86U9P/JSiWpU5hk2VIYyLrbiZJsnsC7QZWRQmJWy7P19UJvAXnpg5Bs4W5 i5QhxLG7MhqR2UxaVogdPWXCtRU8g9JM2Uux2V5gf0XHxDRv/tyIITVvn24RHVZWdfIp iUfw== X-Gm-Message-State: AElRT7G4w7Y6XlKm1d4YZQdM/AtpxKhnadmnSegmRO5fxu0tindlp+VI UEKoSUZ3EzLTXURFW1tFv5SX2OPgLKQBng9tdDjorA== X-Google-Smtp-Source: AG47ELvdNO2RJx6RCBgfVFzRp0EDuU6OQQnknboU5liMlbxWL2SzmUoVEuQnbblHIizpiCQz+A9ml/ql1d/Syup4Mcc= X-Received: by 10.80.246.12 with SMTP id c12mr1050299edn.93.1520646166833; Fri, 09 Mar 2018 17:42:46 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Mars Hall Date: Sat, 10 Mar 2018 01:42:36 +0000 Message-ID: Subject: Re: Issue with loading dependencies and jars To: user@predictionio.apache.org Content-Type: multipart/alternative; boundary="f403045f795c9aa179056705063b" --f403045f795c9aa179056705063b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Where does the classpath in spark-submit originate? Is compute-classpath.sh not the source? As noted previously, the stable-ordering fix by me in compute-classpath.sh no longer seems to be effective either. Looks like some tracing of classpath assembly through the Spark command runner is required: https://github.com/apache/predictionio/blob/develop/tools/src/main/scala/or= g/apache/predictionio/tools/Runner.scala#L185 Unless someone with more knowledge of these internals could weigh-in=E2=80= =A6 Donald? =F0=9F=98=AC=F0=9F=98=8A On Fri, Mar 9, 2018 at 15:44 Shane Johnson wrote: > One additional item that you mentioned earlier is that we would need to > remove or skip the aws-java-sdk.jar that is already in the CLASSPATH. Do > you think this has impact? I did not write anything to skip or remove the > existing aws-java-sdk.jar. > > aws-java-sdk.jar is already in the CLASSPATH though, So, the script will >> need to skip or remove it first. > > > *Shane Johnson | LIFT IQ* > *Founder | CEO* > > *www.liftiq.com * or *shane@liftiq.com > * > mobile: (801) 360-3350 > LinkedIn | Twitter > | Facebook > > > > > On Fri, Mar 9, 2018 at 4:41 PM, Shane Johnson wrote: > >> Now that I am able to deploy I reset the buildpack to >> ...#debug-custom-dist and redeployed. Here is the build log...URL does >> point to the correct distribution with the edited compute-classpath.sh f= ile. >> >> -----> JVM Common app detected >> >> -----> Installing JDK 1.8... done >> >> -----> PredictionIO app detected >> >> -----> Install core components >> >> + PredictionIO (https://s3-us-west-1.amazonaws.com/predictionio/0= .12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz) >> >> + Spark (spark-2.1.1-bin-hadoop2.7) >> >> -----> Install supplemental components >> >> + PostgreSQL (JDBC) >> >> + S3 HDFS (AWS SDK) >> >> + S3 HDFS (Hadoop-AWS) >> >> Writing default 'core-site.xml.erb' >> >> + local Maven repo from buildpack (contents) >> >> -----> Configure PredictionIO >> >> Writing default 'pio-env.sh' >> >> Writing default 'spark-defaults.conf.erb' >> >> + Maven repo from buildpack (build.sbt entry) >> >> Set-up environment via '.profile.d/' scripts >> >> -----> Install JVM (heroku/jvm-common) >> >> -----> PredictionIO engine >> >> Quietly logging. (Set `PIO_VERBOSE=3Dtrue` for detailed build log= .) >> >> [INFO] [Engine$] Using command '/tmp/build_67e7942abed821fccc839c= 9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/Predictio= nIO-dist/sbt/sbt' at /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-sc= ore-e92ed3de9212d04972e0e67e68b5407489e0c8d0 to build. >> >> [INFO] [Engine$] If the path above is incorrect, this process wil= l fail. >> >> [INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-= 0.12.0-incubating.jar is absent. >> >> [INFO] [Engine$] Going to run: /tmp/build_67e7942abed821fccc839c9= a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/Prediction= IO-dist/sbt/sbt package assemblyPackageDependency in /tmp/build_67e7942abe= d821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8= d0 >> >> [INFO] [Engine$] Compilation finished successfully. >> >> [INFO] [Engine$] Looking for an engine... >> >> [INFO] [Engine$] Found template-scala-parallel-liftscoring_2.11-0= .1-SNAPSHOT.jar >> >> [INFO] [Engine$] Found template-scala-parallel-liftscoring-assemb= ly-0.1-SNAPSHOT-deps.jar >> >> [INFO] [Engine$] Build finished successfully. >> >> [INFO] [Pio$] Your engine is ready for training. >> >> Using default Procfile for engine >> >> -----> Discovering process types >> >> Procfile declares types -> release, train, web >> >> -----> Compressing... >> >> Done: 376.7M >> >> >> The release log is below...I am not seeing the */app/PredictionIO-dist/= lib/spark/aws-java-sdk.jar >> *show up at the beginning of the CLASSPATH, this is what we should see >> correct? I was also manipulating the compute-classpath.sh locally as wel= l, >> I observed that adding a line right before echo "$CLASSPATH" was not >> changing what was in the logged spark-submit command as an FYI. This is >> what I was testing locally... >> >> >> CLASSPATH=3D"*/Users/shanejohnson/Desktop/Apps/liftiq_platform/lift-s* >> *core/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating= .jar* >> :$CLASSPATH" >> echo "$CLASSPATH" >> >> I did not see any change in the spark-submit command by adding this when >> building and deploying locally. >> >> Release Log with new buildpack ..#debug-custom-dist >> >> Running train on release=E2=80=A6 >> >> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=3DUTF-8 >> >> [INFO] [Runner$] Submission command: /app/PredictionIO-dist/vendors/spar= k-hadoop/bin/spark-submit --driver-memory 13g --class org.apache.prediction= io.workflow.CreateWorkflow --jars file:/app/PredictionIO-dist/lib/postgresq= l_jdbc.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring-= assembly-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala-p= arallel-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/s= park/._pio-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO= -dist/lib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/P= redictionIO-dist/lib/spark/._pio-data-s3-assembly-0.12.0-incubating.jar,fil= e:/app/PredictionIO-dist/lib/spark/._pio-data-localfs-assembly-0.12.0-incub= ating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12= .0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-elasticsea= rch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._= pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionI= O-dist/lib/spark/._pio-data-hdfs-assembly-0.12.0-incubating.jar,*file:/app/= PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*,fil= e:/app/PredictionIO-dist/lib/spark/pio-data-hbase-assembly-0.12.0-incubatin= g.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-jdbc-assembly-0.12.0= -incubating.jar,file:/app/PredictionIO-dist/lib/spark/hadoop-aws.jar,file:/= app/PredictionIO-dist/lib/spark/pio-data-hdfs-assembly-0.12.0-incubating.ja= r,*file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar* --files file:/ap= p/PredictionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist/conf/= core-site.xml --driver-class-path /app/PredictionIO-dist/conf:/app/Predicti= onIO-dist/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/Predicti= onIO-dist/conf --driver-java-options -Dpio.log.dir=3D/app file:/app/Predict= ionIO-dist/lib/pio-assembly-0.12.0-incubating.jar --engine-id org.template.= liftscoring.LiftScoringEngine --engine-version 0c35eebf403cf91fe77a64921d76= aa1ca6411d20 --engine-variant file:/app/engine.json --verbosity 0 --json-ex= tractor Both --env PIO_ENV_LOADED=3D1,PIO_EVENTSERVER_APP_NAME=3Dclassi,PIO= _STORAGE_SOURCES_PGSQL_INDEX=3Denabled,PIO_S3_AWS_ACCESS_KEY_ID=3DAKIAJJX2S= 55QPCPZXGFQ,PIO_STORAGE_REPOSITORIES_METADATA_NAME=3Dpio_meta,PIO_FS_BASEDI= R=3D/app/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=3Dlocalhost,PIO= _S3_BUCKET_NAME=3Dlift-model-devmaster,PIO_EVENTSERVER_ACCESS_KEY=3D5954-20= 848-7512-17427-21660,PIO_HOME=3D/app/PredictionIO-dist,PIO_FS_ENGINESDIR=3D= /app/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=3Djdbc:postgresql://e= c2-52-70-46-243.compute-1.amazonaws.com:5432/dbvbo86hohutvb?sslmode=3Drequi= re,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=3Delasticsearch,PIO_STORAGE_REPOS= ITORIES_METADATA_SOURCE=3DPGSQL,PIO_SPARK_OPTS=3D--driver-memory 13g ,PIO_S= TORAGE_REPOSITORIES_MODELDATA_SOURCE=3DPGSQL,PIO_STORAGE_REPOSITORIES_EVENT= DATA_NAME=3Dpio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=3Dp5c404ac780ab517= d4ab249d7000809b51b4b987fdfb5c26e1bace511130337ac,PIO_STORAGE_SOURCES_ELAST= ICSEARCH_HOME=3D/app/PredictionIO-dist/vendors/elasticsearch,PIO_STORAGE_SO= URCES_PGSQL_TYPE=3Djdbc,PIO_FS_TMPDIR=3D/app/.pio_store/tmp,PIO_STORAGE_SOU= RCES_PGSQL_USERNAME=3Dubefhv668b1s1m,PIO_STORAGE_REPOSITORIES_MODELDATA_NAM= E=3Dpio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=3Dhttp,PIO_S3_AWS_S= ECRET_ACCESS_KEY=3DtQwL1PgYR0Y5MHG+qwVgEXNEcDcdlupaN2oO6JuR,PIO_TRAIN_SPARK= _OPTS=3D--driver-memory 13g ,PIO_STORAGE_SOURCES_PGSQL_CONNECTIONS=3D8,PIO_= STORAGE_REPOSITORIES_EVENTDATA_SOURCE=3DPGSQL,PIO_CONF_DIR=3D/app/Predictio= nIO-dist/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=3D9200,PIO_STORAGE_SO= URCES_PGSQL_PARTITIONS=3D4,PIO_S3_AWS_REGION=3Dus-east-1 >> >> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=3DUTF-8 >> >> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=3DUTF-8 >> >> [INFO] [Engine] Extracting datasource params... >> >> [INFO] [Engine] Datasource params: (,DataSourceParams(Some(5))) >> >> [INFO] [Engine] Extracting preparator params... >> >> [WARN] [WorkflowUtils$] Non-empty parameters supplied to org.template.li= ftscoring.Preparator, but its constructor does not accept any arguments. St= ubbing with empty parameters. >> >> [INFO] [Engine] Preparator params: (,Empty) >> >> >> >> >> >> *Shane Johnson | LIFT IQ* >> *Founder | CEO* >> >> *www.liftiq.com * or *shane@liftiq.com >> * >> mobile: (801) 360-3350 >> LinkedIn | Twitter >> | Facebook >> >> >> >> >> On Fri, Mar 9, 2018 at 11:17 AM, Mars Hall >> wrote: >> >>> I'm lost as to how such direct manipulation of CLASSPATH is not >>> appearing in the logged spark-submit command. >>> >>> What could cause this!? >>> >>> I just pushed a version of the buildpack which should help debug. >>> Assuming only a single buildpack is assigned to the app, here's how to = set >>> it: >>> >>> heroku buildpacks:set >>> https://github.com/heroku/predictionio-buildpack#debug-custom-dist >>> >>> Then redeploy the engine an check the build log for the line: >>> >>> + PredictionIO ($URL) >>> >>> Please confirm that it is the URL of your custom PredictionIO dist. >>> >>> On Fri, Mar 9, 2018 at 2:47 PM, Shane Johnson wrote: >>> >>>> Thanks Donald and Mars, >>>> >>>> I created a new distribution ( >>>> >>>> https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apac= he-predictionio-0.12.0-incubating-bin.tar.gz) >>>> with the added CLASSPATH code and pointed to the distribution with >>>> the PREDICTIONIO_DIST_URL variable within the engine app in Heroku. >>>> >>>> CLASSPATH=3D"/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar >>>> :$CLASSPATH" >>>> echo "$CLASSPATH" >>>> >>>> It didn't seem to force the aws-java-sdk to load first as I reviewed >>>> the release logs. Should the aws-java-sdk.jar show up as the first fil= e >>>> within the --jars section when this is added CLASSPATH=3D" >>>> /app/PredictionIO-dist/lib/spark/aws-java-sdk.jar:$CLASSPATH". >>>> >>>> I'm still getting the NoSuchMethodError when the *aws-java-sdk.jar* lo= ads >>>> after the *pio-data-s3-assembly-0.12.0-incubating.jar**. *Do you have >>>> other suggestions to try? I was also testing locally to change the ord= er of >>>> the --jars but changes to the compute-classpath.sh didn't seem to chan= ge >>>> the order of the jars in the logs. >>>> >>>> Running train on release=E2=80=A6 >>>> >>>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=3DUTF-8 >>>> >>>> [INFO] [Runner$] Submission command: /app/PredictionIO-dist/vendors/sp= ark-hadoop/bin/spark-submit --driver-memory 13g --class org.apache.predicti= onio.workflow.CreateWorkflow --jars file:/app/PredictionIO-dist/lib/postgre= sql_jdbc.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscorin= g-assembly-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala= -parallel-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib= /spark/pio-data-hdfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-= dist/lib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/Pr= edictionIO-dist/lib/spark/pio-data-elasticsearch-assembly-0.12.0-incubating= .jar,file:/app/PredictionIO-dist/lib/spark/hadoop-aws.jar,file:/app/Predict= ionIO-dist/lib/spark/pio-data-hbase-assembly-0.12.0-incubating.jar,*file:/a= pp/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*,= file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12.0-incubat= ing.jar,*file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar* --files fi= le:/app/PredictionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist= /conf/core-site.xml --driver-class-path /app/PredictionIO-dist/conf:/app/Pr= edictionIO-dist/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/Pr= edictionIO-dist/conf --driver-java-options -Dpio.log.dir=3D/app file:/app/P= redictionIO-dist/lib/pio-assembly-0.12.0-incubating.jar --engine-id org.tem= plate.liftscoring.LiftScoringEngine --engine-version 0c35eebf403cf91fe77a64= 921d76aa1ca6411d20 --engine-variant file:/app/engine.json --verbosity 0 --j= son-extractor Both --env >>>> >>>> >>>> Error: >>>> >>>> Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.= services.s3.transfer.TransferManager.(Lcom/amazonaws/services/s3/Amaz= onS3;Ljava/util/concurrent/ThreadPoolExecutor;)V >>>> >>>> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.ja= va:287) >>>> >>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2= 669) >>>> >>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) >>>> >>>> >>>> >>>> >>>> >>>> *Shane Johnson | LIFT IQ* >>>> *Founder | CEO* >>>> >>>> *www.liftiq.com * or *shane@liftiq.com >>>> * >>>> mobile: (801) 360-3350 >>>> LinkedIn | Twitter >>>> | Facebook >>>> >>>> >>>> >>>> >>>> On Wed, Mar 7, 2018 at 1:01 PM, Mars Hall >>>> wrote: >>>> >>>>> Shane, >>>>> >>>>> On Wed, Mar 7, 2018 at 4:49 AM, Shane Johnson >>>>> wrote: >>>>> >>>>>> >>>>>> Re: adding a line to ensure a jar is loaded first. Is this what you >>>>>> are referring to...(line at the bottom in red)? >>>>>> >>>>> >>>>> >>>>> I believe the code would need to look like this to effect the output >>>>> classpath as intended: >>>>> >>>>> >>>>>> CLASSPATH=3D"/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar >>>>>> :$CLASSPATH" >>>>>> echo "$CLASSPATH" >>>>>> >>>>> >>>>> >>>>> aws-java-sdk.jar is already in the CLASSPATH though, So, the script >>>>> will need to be skip or remove it first. >>>>> >>>>> -- >>>>> *Mars Hall >>>>> 415-818-7039 <(415)%20818-7039> >>>>> Customer Facing Architect >>>>> Salesforce Platform / Heroku >>>>> San Francisco, California >>>>> >>>> >>>> >>> >>> >>> -- >>> *Mars Hall >>> 415-818-7039 <(415)%20818-7039> >>> Customer Facing Architect >>> Salesforce Platform / Heroku >>> San Francisco, California >>> >> >> > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California --f403045f795c9aa179056705063b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Where does the classpath in spark-submit originate? = Is compute-classpath.sh not the source?

As noted previously, the stable-ordering fix by me in compu= te-classpath.sh no longer seems to be effective either.

Looks like some tracing of classpath assemb= ly through the Spark command runner is required:=C2=A0https://github.com/apache/predictionio= /blob/develop/tools/src/main/scala/org/apache/predictionio/tools/Runner.sca= la#L185

Unless someo= ne with more knowledge of these internals could weigh-in=E2=80=A6 Donald? = =F0=9F=98=AC=F0=9F=98=8A

On Fri, M= ar 9, 2018 at 15:44 Shane Johnson <s= hane@liftiq.com> wrote:
One additional item that you mentioned earlier is that we would need to re= move or skip the=C2=A0aws-java-sdk.jar that is a= lready in the CLASSPATH. Do you think this has impact? I did not write anyt= hing to skip or remove the existing=C2=A0aws-jav= a-sdk.jar.

= aws-java-sdk.jar is already in the CLASSPATH though, So, the script will ne= ed to skip or remove it first.

Shane Johnson = | LIFT IQ
Founder | CEO

mobile:=C2=A0(801) 360-3350
LinkedIn=C2=A0 | =C2=A0Twitter=C2=A0=C2=A0| =C2=A0Facebook



On Fri, Mar= 9, 2018 at 4:41 PM, Shane Johnson <shane@liftiq.com> wrote:
Now that I am able to deploy I reset the buildpa= ck to ...#debug-custom-dist and redeployed. Here is the build log...URL doe= s point to the correct distribution with the edited compute-classpath.sh fi= le.

-----> JVM Common ap=
p detected
----->=
; Installing JDK 1.8... done
-----> PredictionIO app detected
-----> Install core components
       + PredictionIO (https://s3-us-w=
est-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12=
.0-incubating-bin.tar.gz)
       + Spark (spark-2.1.1-bin-hadoop2.7)
-----> Install supplemental component=
s
       + PostgreS=
QL (JDBC)
       + =
S3 HDFS (AWS SDK)
 =
      + S3 HDFS (Hadoop-AWS)
         Writing default 'core-site.xml.erb'
       + local Maven repo from =
buildpack (contents)
-----> Configure PredictionIO
       Writing default 'pio-env.sh'
       Writing default 'spark-=
defaults.conf.erb'
       + Maven repo from buildpack (build.sbt entry)
       Set-up environment via '=
.profile.d/' scripts
-----> Install JVM (heroku/jvm-common)
-----> PredictionIO engine
       Quietly logging. (Set `PIO_=
VERBOSE=3Dtrue` for detailed build log.)
       [INFO] [Engine$] Using command '/tm=
p/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0=
e67e68b5407489e0c8d0/PredictionIO-dist/sbt/sbt' at /tmp/build_67e7942ab=
ed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c=
8d0 to build.
       [INFO] [Engine$] If the path above is incorrect, this process will =
fail.
       [INFO]=
 [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.12.0-incubatin=
g.jar is absent.
       [INFO] [Engine$] Going to run: /tmp/build_67e7942abed821fccc839=
c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/Predicti=
onIO-dist/sbt/sbt  package assemblyPackageDependency in /tmp/build_67e7942a=
bed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0=
c8d0
       [INFO] =
[Engine$] Compilation finished successfully.
       [INFO] [Engine$] Looking for an engine...<=
/pre>
       [INFO] [Engi=
ne$] Found template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar
<= pre class=3D"m_4470704699092847663m_5643555411510082128gmail-build-stream-l= ine" style=3D"box-sizing:border-box;font-family:Menlo,Monaco,Consolas,"= ;Courier New",monospace;font-size:12px;overflow:auto;background:0px 0p= x rgb(247,248,251);color:rgb(63,63,68);display:block;padding:0px;margin:0px= ;line-height:18px;border:none;border-radius:0px;white-space:pre-wrap;word-w= rap:break-word;word-break:normal;font-style:normal;font-variant-ligatures:n= ormal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-a= lign:start;text-indent:0px;text-transform:none;word-spacing:0px;text-decora= tion-style:initial;text-decoration-color:initial"> [INFO] [Engine$] F= ound template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar
       [INFO] [Engine$=
] Build finished successfully.
       [INFO] [Pio$] Your engine is ready for training.
Using default Procf= ile for engine
----=
-> Discovering process types
       Procfile declares types -> release, train, web
=
-----> Compressing...
       Done: 376.7M

The release log is below...I am not seeing the=C2=A0= =C2=A0/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar= show up at the beginning of the CLASSPATH, this is= what we should see correct? I was also manipulating the compute-classpath.= sh locally as well, I observed that adding a line right before echo "$CLASSPAT= H" was not changing wha= t was in the logged spark-submit command as an FYI. This is what I was = testing locally...


CLASSPATH=3D"/Users/shanejohnson/Desktop/Apps/liftiq_platform/lift-score/PredictionIO-dist/lib/spark/pio-data-s3-assembly= -0.12.0-incubating.jar:$CLASSPATH"
echo "$CLASSPATH"<= br>

I did not see any change in the= =C2=A0spark-submit command=C2=A0= by adding this when building and deploying locally.

Release Log with new buildpack=C2=A0..#debug-custom-dist

Running tr=
ain on release=E2=80=A6
Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=3DUTF-8 
[INF=
O] [Runner$] Submission command: /app/PredictionIO-dist/vendors/spark-hadoo=
p/bin/spark-submit --driver-memory 13g --class org.apache.predictionio.work=
flow.CreateWorkflow --jars file:/app/PredictionIO-dist/lib/postgresql_jdbc.=
jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring-assembl=
y-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala-parallel=
-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/spark/._=
pio-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/l=
ib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/Predicti=
onIO-dist/lib/spark/._pio-data-s3-assembly-0.12.0-incubating.jar,file:/app/=
PredictionIO-dist/lib/spark/._pio-data-localfs-assembly-0.12.0-incubating.j=
ar,file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12.0-incu=
bating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-elasticsearch-ass=
embly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-dat=
a-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/=
lib/spark/._pio-data-hdfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-s3-assembly-=
0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hbase-assembly-0.1=
2.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-jdbc-as=
sembly-0.12.0-incubating.jar,file:/app/Predi=
ctionIO-dist/lib/spark/hadoop-aws.jar,file:/=
app/PredictionIO-dist/lib/spark/pio-data-hdfs-assembly-0.12.0-incubating.ja=
r,file:/app/PredictionIO-dist/lib/spark/a=
ws-java-sdk.jar --files file:/app/Predic=
tionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist/conf/core-sit=
e.xml --driver-class-path /app/PredictionIO-dist/conf:/app/PredictionIO-dis=
t/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/PredictionIO-dis=
t/conf --driver-java-options -Dpio.log.dir=3D/app file:/app/PredictionIO-di=
st/lib/pio-assembly-0.12.0-incubating.jar --engine-id org.template.liftscor=
ing.LiftScoringEngine --engine-version 0c35eebf403cf91fe77a64921d76aa1ca641=
1d20 --engine-variant file:/app/engine.json --verbosity 0 --json-extractor =
Both --env PIO_ENV_LOADED=3D1,PIO_EVENTSERVER_APP_NAME=3Dclassi,PIO_STORAGE=
_SOURCES_PGSQL_INDEX=3Denabled,PIO_S3_AWS_ACCESS_KEY_ID=3DAKIAJJX2S55QPCPZX=
GFQ,PIO_STORAGE_REPOSITORIES_METADATA_NAME=3Dpio_meta,PIO_FS_BASEDIR=3D/app=
/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=3Dlocalhost,PIO_S3_BUCK=
ET_NAME=3Dlift-model-devmaster,PIO_EVENTSERVER_ACCESS_KEY=3D5954-20848-7512=
-17427-21660,PIO_HOME=3D/app/PredictionIO-dist,PIO_FS_ENGINESDIR=3D/app/.pi=
o_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=3Djdbc:postgresql://ec2-52-70-46-243.compute-1.amazonaws.com:5432/dbvbo86h=
ohutvb?sslmode=3Drequire,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=3Delasticse=
arch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=3DPGSQL,PIO_SPARK_OPTS=3D--dr=
iver-memory 13g ,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=3DPGSQL,PIO_=
STORAGE_REPOSITORIES_EVENTDATA_NAME=3Dpio_event,PIO_STORAGE_SOURCES_PGSQL_P=
ASSWORD=3Dp5c404ac780ab517d4ab249d7000809b51b4b987fdfb5c26e1bace511130337ac=
,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=3D/app/PredictionIO-dist/vendors/el=
asticsearch,PIO_STORAGE_SOURCES_PGSQL_TYPE=3Djdbc,PIO_FS_TMPDIR=3D/app/.pio=
_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=3Dubefhv668b1s1m,PIO_STORAGE_=
REPOSITORIES_MODELDATA_NAME=3Dpio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_S=
CHEMES=3Dhttp,PIO_S3_AWS_SECRET_ACCESS_KEY=3DtQwL1PgYR0Y5MHG+qwVgEXNEcDcdlu=
paN2oO6JuR,PIO_TRAIN_SPARK_OPTS=3D--driver-memory 13g ,PIO_STORAGE_SOURCES_=
PGSQL_CONNECTIONS=3D8,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=3DPGSQL,PIO=
_CONF_DIR=3D/app/PredictionIO-dist/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_P=
ORTS=3D9200,PIO_STORAGE_SOURCES_PGSQL_PARTITIONS=3D4,PIO_S3_AWS_REGION=3Dus=
-east-1
Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=3DUTF-8 
Picked up JAVA_TOOL_OPTIONS: -=
Xmx12g -Dfile.encoding=3DUTF-8 
[INFO] [Engine] Extracting datasource params...
[INFO] [Engine] Datasource= params: (,DataSourceParams(Some(5)))
[INFO] [Engine] Extracting preparator params...
[WARN] [WorkflowUtils$] Non= -empty parameters supplied to org.template.liftscoring.Preparator, but its = constructor does not accept any arguments. Stubbing with empty parameters.<= /pre>
[INFO] [Engine] Pre=
parator params: (,Empty)




Shane Johns= on | LIFT IQ
Founder | CEO

mobile:=C2=A0(801) 360-3350
LinkedIn=C2=A0 | =C2=A0Twitter=C2=A0=C2=A0| =C2=A0Facebook



On Fri, Mar 9, 2018 at 11:17 AM= , Mars Hall <mars.hall@salesforce.com> wrote:
I&#= 39;m lost as to how such direct manipulation of CLASSPATH is not appearing = in the logged spark-submit command.

What could cause thi= s!?

I just pushed a version of the buildpack which= should help debug. Assuming only a single buildpack is assigned to the app= , here's how to set it:


Then redeploy the engine an check the build log f= or the line:

=C2=A0 =C2=A0 =C2=A0 + PredictionIO (= $URL)

Please confirm that it is the URL of your cu= stom PredictionIO dist.

On Fri, = Mar 9, 2018 at 2:47 PM, Shane Johnson <shane@liftiq.com> wrote:
Thanks Donald and Mars,

I = created a new distribution (https://s3-us-west-1.amazonaws.c= om/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin= .tar.gz) with the added CLASSPATH code and pointed to the distribution = with the=C2=A0PREDICTIONIO_DIST_URL variable within the engine app in Herok= u.

CLASSPATH=3D&quo= t;/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar:$CLASSPATH&quo= t;
echo "$CLASS= PATH"

It didn't seem to force the= aws-java-sdk to load first as I reviewed the release logs. Should the aws-= java-sdk.jar show up as the first file within the --jars section when this = is added=C2=A0CLASSPATH=3D"/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar:$CLASSPATH".

I'm still getting the NoSuchMethodError when the=C2=A0<= b>aws-java-sdk.jar=C2=A0loads after the=C2=A0pio-data-s3-assembly-0.= 12.0-incubating.jar. Do you have other= suggestions to try? I was also testing locally to change the order of the = --jars but changes to the compute-classpath.sh didn't seem to change th= e order of the jars in the logs.
=
Running train on release=E2=
=80=A6
Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.e=
ncoding=3DUTF-8 
[INFO] [Runner$] Su=
bmission command: /app/PredictionIO-dist/vendors/spark-hadoop/bin/spark-sub=
mit --driver-memory 13g --class org.apache.predictionio.workflow.CreateWork=
flow --jars file:/app/PredictionIO-dist/lib/postgresql_jdbc.jar,file:/app/t=
arget/scala-2.11/template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-=
deps.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring_2.=
11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hdfs-ass=
embly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-=
localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spar=
k/pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/Predictio=
nIO-dist/lib/spark/hadoop-aws.jar,file:/app/PredictionIO-dist/lib/spark/pio=
-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubat=
ing.jar,file:/app/PredictionIO-dist/lib/=
spark/pio-data-jdbc-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar=
 --files file:/app/PredictionIO-dist/conf/log4j.pro=
perties,file:/app/PredictionIO-dist/conf/core-site.xml --driver-class-path =
/app/PredictionIO-dist/conf:/app/PredictionIO-dist/conf:/app/PredictionIO-d=
ist/lib/postgresql_jdbc.jar:/app/PredictionIO-dist/conf --driver-java-optio=
ns -Dpio.log.dir=3D/app file:/app/PredictionIO-dist/lib/pio-assembly-0.12.0=
-incubating.jar --engine-id org.template.liftscoring.LiftScoringEngine --en=
gine-version 0c35eebf403cf91fe77a64921d76aa1ca6411d20 --engine-variant file=
:/app/engine.json --verbosity 0 --json-extractor Both --env 

Error:
Exception in =
thread "main" java.lang.NoSuchMethodError: com.amazonaws.services=
.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/Amazon=
S3;Ljava/util/concurrent/ThreadPoolExecutor;)V
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:=
287)
	at org.apache.hadoop.fs.FileSystem.createFileS=
ystem(FileSystem.java:2669)
	at org.apache.hadoop.fs=
.FileSystem.access$200(FileSystem.java:94)


<= div>




--
*Mars Hall<= /div>
Customer Facing Architect
Sal= esforce Platform / Heroku
San Francisco, California


--
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California
--f403045f795c9aa179056705063b--