predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian Krause <florian.kra...@rebelle.com>
Subject Re: Change of handling of env variables in 0.11?
Date Tue, 30 May 2017 14:35:50 GMT
Hi

So I managed to fix this … took me a while to find out. So in case anyone cares:

...
In my code I had something like this:

def predict(model: ECommModel, query: Query): PredictedResult = {

    val userFeatures = model.userFeatures
    val productModels = model.productModels
…
}

val unavailableItems: Set[String] = try {
      val constr = LEventStore.findByEntity(
        appName = ap.sharedApp,
        entityType = "constraint",
        entityId = "unavailableItems"
…
}

So the idea was that the unavailable items only get populated once during the deployment (and
therefore to my understanding instantiation of my the ECommAlgorithm class). Pulling the unavailable
products in every incoming request turned out to be too slow …

This worked in 0.10 but in 0.11 I was getting the „env vars not set“ errors.

Apparently something was changed that changes the scoping of the env vars in the engines during
testing.

Bests

Florian



Am 22. Mai 2017 um 13:58:05, Florian Krause (florian.krause@rebelle.com) schrieb:

Hi Chan

thanks a lot for reaching out to me ... 

pio@predict-io:/opt/reco-engine$ /opt/PredictionIO-0.11.0-incubating/bin/pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /opt/PredictionIO-0.11.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: PGSQL)...
[INFO] [Storage$] Verifying Model Data Backend (Source: PGSQL)...
[INFO] [Storage$] Verifying Event Data Backend (Source: PGSQL)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [Management$] Your system is all ready to go.

---
pio@predict-io:/opt/reco-engine/MatrixProduct2$ /opt/PredictionIO-0.11.0-incubating/bin/pio
status --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /opt/PredictionIO-0.11.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: PGSQL)...
[DEBUG] [ConnectionPool$] Registered connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio,
user:pio) using factory : <default>
[DEBUG] [ConnectionPool$] Registered singleton connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio,
user:pio)
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed

  [SQL Execution]
   create table if not exists pio_meta_engineinstances ( id varchar(100) not null primary
key, status text not null, startTime timestamp DEFAULT CURRENT_TIMESTAMP, endTime timestamp
DEFAULT CURRENT_TIMESTAMP, engineId text not null, engin
eVersion text not null, engineVariant text not null, engineFactory text not null, batch text
not null, env text not null, sparkConf text not null, datasourceParams text not null, preparatorParams
text not null, algorithmsParams text not n
ull, servingParams text not null); (3 ms)

  [Stack Trace]
    ...
    org.apache.predictionio.data.storage.jdbc.JDBCEngineInstances$$anonfun$1.apply(JDBCEngineInstances.scala:49)
    org.apache.predictionio.data.storage.jdbc.JDBCEngineInstances$$anonfun$1.apply(JDBCEngineInstances.scala:32)
    scalikejdbc.DBConnection$class.autoCommit(DBConnection.scala:222)
    scalikejdbc.DB.autoCommit(DB.scala:60)
    scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:215)
    scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:214)
    scalikejdbc.LoanPattern$class.using(LoanPattern.scala:18)
    scalikejdbc.DB$.using(DB.scala:138)
-- 
So this works .. building with tests enabled doesn't

---
/opt/PredictionIO-0.11.0-incubating/bin/pio build --verbose
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Engine$] Using command '/opt/PredictionIO-0.11.0-incubating/sbt/sbt' at /opt/reco-engine/MatrixProduct2
to build.
[INFO] [Engine$] If the path above is incorrect, this process will fail.
[INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.11.0-incubating.jar is
absent.
[INFO] [Engine$] Going to run: /opt/PredictionIO-0.11.0-incubating/sbt/sbt  package assemblyPackageDependency
in /opt/reco-engine/MatrixProduct2
[INFO] [Engine$] [info] Loading project definition from /opt/reco-engine/MatrixProduct2/project
[INFO] [Engine$] [info] Set current project to MatrixProduct2 (in build file:/opt/reco-engine/MatrixProduct2/)
[INFO] [Engine$] [success] Total time: 0 s, completed May 22, 2017 11:52:26 AM
[INFO] [Engine$] [info] Including from cache: shared_2.11.jar
[INFO] [Engine$] [info] Including from cache: snappy-java-1.1.1.7.jar
[INFO] [Engine$] [info] Including from cache: scala-library-2.11.8.jar
[ERROR] [Engine$] log4j:WARN No appenders could be found for logger (org.apache.predictionio.data.storage.Storage$).
[ERROR] [Engine$] log4j:WARN Please initialize the log4j system properly.
[ERROR] [Engine$] log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
[INFO] [Engine$] org.apache.predictionio.data.storage.StorageClientException: Data source
PGSQL was not properly initialized.
[INFO] [Engine$]        at org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
[INFO] [Engine$]        at org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
[INFO] [Engine$]        at scala.Option.getOrElse(Option.scala:121)
[INFO] [Engine$]        at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284)
[INFO] [Engine$]        at org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:269)
[INFO] [Engine$]        at org.apache.predictionio.data.storage.Storage$.getMetaDataApps(Storage.scala:387)
[INFO] [Engine$]        at org.apache.predictionio.data.store.Common$.appsDb$lzycompute(Common.scala:27)
[INFO] [Engine$]        at org.apache.predictionio.data.store.Common$.appsDb(Common.scala:27)
[INFO] [Engine$]        at org.apache.predictionio.data.store.Common$.appNameToId(Common.scala:32)
[INFO] [Engine$]        at org.apache.predictionio.data.store.LEventStore$.findByEntity(LEventStore.scala:75)
[INFO] [Engine$]        at com.rebelle.MatrixProduct2.ECommAlgorithm.liftedTree1$1(ECommAlgorithm.scala:516)
[INFO] [Engine$]        at com.rebelle.MatrixProduct2.ECommAlgorithm.<init>(ECommAlgorithm.scala:515)
[INFO] [Engine$]        at com.rebelle.MatrixProduct2.ECommAlgorithmTest.<init>(ECommAlgorithmTest.scala:31)
[INFO] [Engine$]        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
[INFO] [Engine$]        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[INFO] [Engine$]        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[INFO] [Engine$]        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
[INFO] [Engine$]        at java.lang.Class.newInstance(Class.java:442)
[INFO] [Engine$]        at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:641)
[INFO] [Engine$]        at sbt.TestRunner.runTest$1(TestFramework.scala:76)
[INFO] [Engine$]        at sbt.TestRunner.run(TestFramework.scala:85)

I am using the EventStore in my recommender (to pull in products no longer available). The
test runner seems to instantiate it but then barfs because it can't get the configuration
from the env

Exactly the same engine compiles just fine under 0.10. When I disable the tests with 
test in assembly := {}
in the build.sbt file, compile, train and deploy works fine.

Bests


2017-05-22 12:49 GMT+02:00 Chan Lee <chanlee514@gmail.com>:
Hi Florian,

Can you tell me the output for `pio status`? Does the postgres driver match the argument sent
to spark-submit?

Best,
Chan

On Mon, May 22, 2017 at 1:53 AM, Florian Krause <florian.krause@rebelle.com> wrote:
Hi all

I have been unsuccessful at building my two engines with 0.11. I have described my attempts
here -> https://stackoverflow.com/questions/43941915/predictionio-0-11-building-an-engine-fails-with-java-lang-classnotfoundexceptio

It appears that during the pio build phase, the env vars from pio-env.sh are not set correctly. 

I have managed to get around this by not running the tests, the compiled versions of the engine
work flawless, so the database works.

Now what confuses me a bit is the usage of the —env command line param in the CreateWorkflow
jar. 

This is the command pio sends to spark

/opt/PredictionIO-0.11.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7/bin/spark-submit --driver-memory
80G --executor-memory 80G --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/opt/PredictionIO-0.11.0-incubating/lib/postgresql-42.1.1.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/mysql-connector-java-5.1.40-bin.jar,file:/opt/reco-engine/MatrixProduct2/target/scala-2.11/matrixproduct2_2.11-0.1-SNAPSHOT.jar,file:/opt/reco-engine/MatrixProduct2/target/scala-2.11/MatrixProduct2-assembly-0.1-SNAPSHOT-deps.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-localfs-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-jdbc-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-elasticsearch-assembly-0.11.0-incubating.jar,file:/opt/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hbase-assembly-0.11.0-incubating.jar
--files file:/opt/PredictionIO-0.11.0-incubating/conf/log4j.properties --driver-class-path
/opt/PredictionIO-0.11.0-incubating/conf:/opt/PredictionIO-0.11.0-incubating/lib/postgresql-42.1.1.jar:/opt/PredictionIO-0.11.0-incubating/lib/mysql-connector-java-5.1.40-bin.jar
--driver-java-options -Dpio.log.dir=/home/pio file:/opt/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar
--engine-id com.rebelle.MatrixProduct2.ECommerceRecommendationEngine --engine-version 23bea44eff1a8e08bc80e290e52dc9dc565d9bb7
--engine-variant file:/opt/reco-engine/MatrixProduct2/engine.json --verbosity 0 --json-extractor
Both --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_HOME=/opt/PredictionIO-0.11.0-incubating,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=<password>,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/opt/PredictionIO-0.11.0-incubating/conf


When I try to run this manually from the command line, I get

[ERROR] [Storage$] Error initializing storage client for source
Exception in thread "main" org.apache.predictionio.data.storage.StorageClientException: Data
source  was not properly initialized.
        at org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
        at org.apache.predictionio.data.storage.Storage$$anonfun$10.apply(Storage.scala:285)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284)


So even though all needed params are set in —env, Spark cannot find them. I have to manually
set them via export to make this work. What exactly should happen these vars are set through
—env?

Perhaps someone can give me pointers in what might be worth trying

Bests & thanks

Florian




--

  
Dr. Florian Krause

Chief Technical Officer

____________________



​REBELLE
- StyleRemains GmbH

Brooktorkai 4, D-20457 Hamburg

Tel.:  
   
+49 40 30 70 19 18

Fax:  
+49 40 30 70 19 29

E-Mail:  
 florian.krause@rebelle.com

Website:  www.rebelle.com

Network:  LinkedIn  
 Xing


Managing directors: Sophie-Cécile Gaulke, Max Laurent
Schönemann

Registered in Amtsgericht Hamburg under the No. HRB
126796

This e-mail contains confidential and/or legally protected
information. If you are not the intended recipient or if you have
received this e-mail by error please notify the sender immediately
and destroy this e-mail. Any unauthorized review, copying,
disclosure or distribution of the material in this e-mail is
strictly forbidden. The contents of this e-mail is legally binding
only if it is confirmed by letter or fax. The sending of e-mails to
us does not have any period-protecting effect.



Mime
View raw message