predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vaghawan Ojha <vaghawan...@gmail.com>
Subject Re: Error while training : NegativeArraySizeException
Date Wed, 07 Jun 2017 09:27:55 GMT
Yes you need to build the app again when you change something in the
engine.json. That is every time when you change something in engine.json.

Make sure the data corresponds to the same app which you have provided in
the engine.json.

Yes you can test example instigation test in UR
with ./examples/integration-test this command.

You can find more in here http://actionml.com/docs/ur_quickstart .

On Wed, Jun 7, 2017 at 3:07 PM, Bruno LEBON <b.lebon@redfakir.fr> wrote:

> Yes the three event types that I defined in the engine.json exist in my
> dataset, facet is my primary, I checked that it exists.
>
> I think it is not needed to build again when changing something in the
> engine.json, as the file is read in the process but I built it and tried
> again and I still have the same error.
>
> What is this example-intrigration? I dont know about this. Where can I
> find this script?
>
> 2017-06-07 11:11 GMT+02:00 Vaghawan Ojha <vaghawan781@gmail.com>:
>
>> Hi,
>>
>> For me this problem had happened when I had mistaken my primary events.
>> The first eventName in the eventName array "eventNames":
>> ["facet","view","search"] is primary. There is that event in your data.
>>
>> Did you make sure, you built the app again when you changed the eventName
>> in engine.json?
>>
>> Also you could varify everything's fine with UR with
>> ./example-intrigration.
>>
>> Thanks
>>
>> On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON <b.lebon@redfakir.fr> wrote:
>>
>>> Thanks for your answer.
>>>
>>> *You could explicitly do *
>>>
>>>
>>> *pio train -- --master spark://localhost:7077 --driver-memory 16G
>>> --executor-memory 24G *
>>>
>>> *and change the spark master url and the memories configuration. And see
>>> if that works. *
>>>
>>> Yes that is the command I use to launch the train, except I am on a
>>> cluster, so Spark is not local. Here is mine:
>>>  pio train -- --master spark://master:7077 --driver-memory 4g
>>> --executor-memory 10g
>>>
>>> The train works with different datasets, it also works with this dataset
>>> when I skip the event type *view*. So my guess is that there is
>>> something about this event type, either in the data but the data looks fine
>>> to me, or maybe there is a problem when I use more than two types of event
>>> (this is the first time I have more than two, however I can't believe that
>>> the problem is related the a number of event types).
>>>
>>> The spelling is the same in the event sent to the eventserver ( *view *)
>>> and in the engine.json ( *view *).
>>>
>>> I am reading the code to figure out where this error comes from.
>>>
>>>
>>>
>>> 2017-06-07 10:17 GMT+02:00 Vaghawan Ojha <vaghawan781@gmail.com>:
>>>
>>>> You could explicitly do
>>>>
>>>> pio train -- --master spark://localhost:7077 --driver-memory 16G
>>>> --executor-memory 24G
>>>>
>>>> and change the spark master url and the memories configuration. And see
>>>> if that works.
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON <b.lebon@redfakir.fr>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Using UR with PIO 0.10 I am trying to train my dataset. In return I
>>>>> get the following error:
>>>>>
>>>>> *...*
>>>>> *[INFO] [DataSource] Received events List(facet, view, search)*
>>>>> *[INFO] [DataSource] Number of events List(5, 4, 6)*
>>>>> *[INFO] [Engine$] org.template.TrainingData does not support data
>>>>> sanity check. Skipping check.*
>>>>> *[INFO] [Engine$] org.template.PreparedData does not support data
>>>>> sanity check. Skipping check.*
>>>>> *[INFO] [URAlgorithm] Actions read now creating correlators*
>>>>> *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
>>>>> ip-172-31-40-139.eu-west-1.com
>>>>> <http://ip-172-31-40-139.eu-west-1.com>pute.internal):
>>>>> java.lang.NegativeArraySizeException*
>>>>> *        at
>>>>> org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>>>> *        at
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
>>>>> *        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
>>>>> *        at org.apache.spark.scheduler.Task.run(Task.scala:89)*
>>>>> *        at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
>>>>> *        at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>>>>> *        at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>>>>> *        at java.lang.Thread.run(Thread.java:748)*
>>>>>
>>>>> *[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times;
>>>>> aborting job*
>>>>> *Exception in thread "main" org.apache.spark.SparkException: Job
>>>>> aborted due to stage failure: Task 0 in stage 56.0 failed 4 times, most
>>>>> recent failure: Lost task 0.3 in stage 56.0 (TID 56,
>>>>> ip-172-1-1-1.eu-west-1.compute.internal):
>>>>> java.lang.NegativeArraySizeException*
>>>>> *        at
>>>>> org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>>>> *        at
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
>>>>> *        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
>>>>> *        at org.apache.spark.scheduler.Task.run(Task.scala:89)*
>>>>> *        at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
>>>>> *        at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>>>>> *        at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>>>>> *        at java.lang.Thread.run(Thread.java:748)*
>>>>>
>>>>> *Driver stacktrace:*
>>>>> *        at org.apache.spark.scheduler.DAGScheduler.org
>>>>> <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)*
>>>>> *        at
>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)*
>>>>> *        at
>>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
>>>>> *        at scala.Option.foreach(Option.scala:236)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)*
>>>>> *        at
>>>>> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)*
>>>>> *        at
>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)*
>>>>> *        at
>>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)*
>>>>> *        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)*
>>>>> *        at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$.numNonZeroElementsPerColumn(SparkEngine.scala:81)*
>>>>> *        at
>>>>> org.apache.mahout.math.drm.CheckpointedOps.numNonZeroElementsPerColumn(CheckpointedOps.scala:36)*
>>>>> *        at org.apache.mahout.math.cf
>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$.sampleDownAndBinarize(SimilarityAnalysis.scala:397)*
>>>>> *        at org.apache.mahout.math.cf
>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$$anonfun$cooccurrences$1.apply(SimilarityAnalysis.scala:101)*
>>>>> *        at org.apache.mahout.math.cf
>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$$anonfun$cooccurrences$1.apply(SimilarityAnalysis.scala:95)*
>>>>> *        at
>>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)*
>>>>> *        at
>>>>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)*
>>>>> *        at org.apache.mahout.math.cf
>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$.cooccurrences(SimilarityAnalysis.scala:95)*
>>>>> *        at org.apache.mahout.math.cf
>>>>> <http://org.apache.mahout.math.cf>.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:147)*
>>>>> *        at org.template.URAlgorithm.calcAll(URAlgorithm.scala:280)*
>>>>> *        at org.template.URAlgorithm.train(URAlgorithm.scala:251)*
>>>>> *        at org.template.URAlgorithm.train(URAlgorithm.scala:169)*
>>>>> *        at
>>>>> org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)*
>>>>> *        at
>>>>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)*
>>>>> *        at
>>>>> org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:692)*
>>>>> *        at
>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)*
>>>>> *        at
>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)*
>>>>> *        at scala.collection.immutable.List.foreach(List.scala:318)*
>>>>> *        at
>>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)*
>>>>> *        at
>>>>> scala.collection.AbstractTraversable.map(Traversable.scala:105)*
>>>>> *        at
>>>>> org.apache.predictionio.controller.Engine$.train(Engine.scala:692)*
>>>>> *        at
>>>>> org.apache.predictionio.controller.Engine.train(Engine.scala:177)*
>>>>> *        at
>>>>> org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)*
>>>>> *        at
>>>>> org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)*
>>>>> *        at
>>>>> org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)*
>>>>> *        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)*
>>>>> *        at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
>>>>> *        at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>>>>> *        at java.lang.reflect.Method.invoke(Method.java:498)*
>>>>> *        at
>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)*
>>>>> *        at
>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)*
>>>>> *        at
>>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)*
>>>>> *        at
>>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)*
>>>>> *        at
>>>>> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)*
>>>>> *Caused by: java.lang.NegativeArraySizeException*
>>>>> *        at
>>>>> org.apache.mahout.math.DenseVector.<init>(DenseVector.java:57)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
>>>>> *        at
>>>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>>>> *        at
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
>>>>> *        at
>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
>>>>> *        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
>>>>> *        at
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
>>>>> *        at org.apache.spark.scheduler.Task.run(Task.scala:89)*
>>>>> *        at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
>>>>> *        at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>>>>> *        at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>>>>> *        at java.lang.Thread.run(Thread.java:748)*
>>>>>
>>>>>
>>>>> Now usually this message NegativeArraySizeException tells me that one
>>>>> of the events defined in engine.json doesn't exist in my dataset. However
>>>>> this is not the case here, my three events are present in my dataset.
Here
>>>>> the proves:
>>>>> http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f
>>>>> -a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=facet
>>>>>
>>>>> [{"eventId":"AYDE4TYMjU2dFGWVAYyUYwAAAVx5_afdpSyQHw_eNT0","event":"facet","entityType":"user","entityId":"92ec6a38-9fee-4c99-92a5-46677ad9ca48","targetEntityType":"item","targetEntityId":"alfa-romeo-marque","properties":{},"eventTime":"2017-06-05T20:41:25.725Z","creationTime":"2017-06-05T20:41:25.725Z"}]
>>>>>
>>>>> http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=view
>>>>>
>>>>> [{"eventId":"IjuMNR7h40l_sylo-uqEsAAAAVxoIcPqnumP2B_qWAk","event":"view","entityType":"user","entityId":"bbc5bd25-b1ac-41e0-b771-43fe65a8827e","targetEntityType":"item","targetEntityId":"citroen-marque","properties":{},"eventTime":"2017-06-02T09:27:42.314Z","creationTime":"2017-06-02T09:27:42.314Z"}]
>>>>>
>>>>> http://x.x.x.x:7070/events.json?accessKey=df8ef7dd-0165-4b6f-a008-d1550adbb3df&startTime=2017-06-2T0:0:00.321Z&limit=1&event=search
>>>>>
>>>>> [{"eventId":"AI6NF05NJa3fP2bRpKUxAwAAAVxymnYYjm6nNt3TsGY","event":"search","entityType":"user","entityId":"b2c77901-0824-4583-9999-3cd56c1f34c9","targetEntityType":"item","targetEntityId":"peugeot-marque","properties":{},"eventTime":"2017-06-04T10:15:44.408Z","creationTime":"2017-06-04T10:15:44.408Z"}]
>>>>>
>>>>>
>>>>> I selected only one event per type but there are more.
>>>>>
>>>>>
>>>>> If I keep only the event types *facet *and *search*, then it works, the
train succeeds and I have my model. However as soon as I add the event type *view*, it fails.
I tried putting *view *as a primary event and it doesnt change anything. Not sure why it would
change anything but I tried anyway.
>>>>>
>>>>>
>>>>> Here is my engine.json:
>>>>>
>>>>> *{
>>>>>   "comment":"",
>>>>>   "id": "car",
>>>>>   "description": "settings",
>>>>>   "engineFactory": "org.template.RecommendationEngine",
>>>>>   "datasource": {
>>>>>     "params" : {
>>>>>       "name": "sample-handmade-data.txt",
>>>>>       "appName": "piourcar",
>>>>>       "eventNames": ["facet","view","search"]
>>>>>     }
>>>>>   },
>>>>>   "sparkConf": {
>>>>>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>>>>>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io <http://sparkbindings.io>.MahoutKryoRegistrator",
>>>>>     "spark.kryo.referenceTracking": "false",
>>>>>     "spark.kryoserializer.buffer": "300m",
>>>>>     "es.index.auto.create": "true",
>>>>>     "es.nodes":"espionode1:9200,espionode2:9200,espionode3:9200"
>>>>>   },
>>>>> "algorithms": [
>>>>>     {
>>>>>       "name": "ur",
>>>>>       "params": {
>>>>>         "appName": "piourcar",
>>>>>         "indexName": "urindex_car",
>>>>>         "typeName": "items",
>>>>>         "eventNames": ["facet","view","search"],
>>>>>         "blacklistEvents": [],
>>>>>         "maxEventsPerEventType": 50000,
>>>>>         "maxCorrelatorsPerEventType": 100,
>>>>>         "maxQueryEvents": 10,
>>>>>         "num": 5,
>>>>>         "userBias": 2,
>>>>>         "returnSelf": true
>>>>>       }
>>>>>     }
>>>>>   ]
>>>>> }*
>>>>>
>>>>> Thanks in advance for your help, regards,
>>>>> Bruno
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message