predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 had a not serializable result
Date Thu, 19 Oct 2017 15:12:17 GMT
This sample dataset is too small with too few cooccurrences. U1 will never get i1 due to the
blacklist (u1 has already viewed i1 so will not be recommended that again). The blacklist
can be disable if you want to recommend viewed items again but beware that they may predominate
every recommendations set if you do tun it off since it is self-fulfilling. Why not i2, not
sure without running the math, the UR looks are things statistically and with this small a
dataset anomalies can be seen since the data is not statistically significant. I1 will show
up in internal intermediate results (A’A for instance) but these are then filtered by a
statistical test called LLR, which requires a certain amount of data to work. 

Notice the handmade dataset has many more cooccurrences and produces understandable results.
Also notice that in you dataset i



On Oct 19, 2017, at 1:28 AM, Noelia Osés Fernández <noses@vicomtech.org> wrote:

Pat, this worked!!!!! Thank you very much!!!!

The only odd thing now is that all the results I get now are 0s. For example:

Using the dataset:

"u1","i1"
"u2","i1"
"u2","i2"
"u3","i2"
"u3","i3"
"u4","i4"

echo "Recommendations for user: u1"
echo ""
curl -H "Content-Type: application/json" -d '
{
    "user": "u1"
}' http://localhost:8000/queries.json <http://localhost:8000/queries.json>
echo ""

What I get is:

{"itemScores":[{"item":"\"i2\"","score":0.0},{"item":"\"i1\"","score":0.0},{"item":"\"i3\"","score":0.0},{"item":"\"i4\"","score":0.0}]}


If user u1 has viewed i1 and user u2 has viewed i1 and i2 then I think the algorithm should
return a non-zore score for i2 (and possible i1, too).

Even using the bigger dataset with 100 items I still get all scores 0s.

So now I'm going to spend some time reading the following documentation, unless there is some
other documentation you recommend I read first!

 - [The Universal Recommender](http://actionml.com/docs/ur <http://actionml.com/docs/ur>)
 - [The Correlated Cross-Occurrence Algorithm](http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html
<http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html>)
 - [The Universal Recommender Slide Deck](http://www.slideshare.net/pferrel/unified-recommender-39986309
<http://www.slideshare.net/pferrel/unified-recommender-39986309>)
 - [Multi-domain predictive AI or how to make one thing predict another](https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/
<https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/>)

Thank you very much for all your patience and help getting me to this point!!!

Best regards,
Noelia


On 18 October 2017 at 18:33, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
It is the UR so Events are taken from the EventStore and converted into a Mahout DistributedRowMatrix
of RandomAccessSparseVectors, which are both serializable. This path works fine and has for
several years.

This must be a config problem, like not using the MahoutKryoRegistrator, which registers the
serializers for these.

@Noelia, you have left out the sparkConf section of the engine.json. The one used in the integration
test should work:

{
  "comment":" This config file uses default settings for all but the required values see README.md
for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "tiny_app_data.csv",
      "appName": "TinyApp",
      "eventNames": ["view"]
    }
  },
  "sparkConf": { <================= THIS WAS LEFT OUT IN YOUR ENGINE.JSON BELOW IN THIS
THREAD
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io <http://sparkbindings.io/>.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill,
must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "TinyApp",
        "indexName": "urindex",
        "typeName": "items",
        "comment": "must have data for the first event or the model will not build, other
events are optional",
        "eventNames": ["view"]
      }
    }
  ]
}


On Oct 18, 2017, at 8:49 AM, Donald Szeto <donald@apache.org <mailto:donald@apache.org>>
wrote:

Chiming in a bit. Looking at the serialization error, it looks like we are just one little
step away from getting this to work.

Noelia, what does your synthesized data look like? All data that is processed by Spark needs
to be serializable. At some point, a non-serializable vector object showing in the stack is
created out of your synthesized data. It would be great to know what your input event looks
like and see where in the code path has caused this.

Regards,
Donald

On Tue, Oct 17, 2017 at 12:14 AM Noelia Osés Fernández <noses@vicomtech.org <mailto:noses@vicomtech.org>>
wrote:
Pat, you mentioned the problem could be that the data I was using was too small. So now I'm
using the attached data file as the data (4 users and 100 items). But I'm still getting the
same error. I'm sorry I forgot to mention I had increased the dataset.

The reason why I want to make it work with a very small dataset is because I want to be able
to follow the calculations. I want to understand what the UR is doing and understand the impact
of changing this or that, here or there... I find that easier to achieve with a small example
in which I know exactly what's happening. I want to build my trust on my understanding of
the UR before I move on to applying it to a real problem. If I'm not confident that I know
how to use it, how can I tell my client that the results I'm getting are good with any degree
of confidence?





On 16 October 2017 at 20:44, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
So all setup is the same for the integration-test and your modified test *except the data*?

The error looks like a setup problem because the serialization should happen with either test.
But if the only difference really is the data, then toss it and use either real data or the
integration test data, why are you trying to synthesize fake data if it causes the error?

BTW the data you include below in this thread would never create internal IDs as high as 94
in the vector. You must have switched to a new dataset???

I would get a dump of your data using `pio export` and make sure it’s what you thought it
was. You claim to have only 4 user ids and 4 item ids but the serialized vector thinks you
have at least 94 of user or item ids. Something doesn’t add up.


On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <noses@vicomtech.org <mailto:noses@vicomtech.org>>
wrote:

Pat, you are absolutely right! I increased the sleep time and now the integration test for
handmade works perfectly.

However, the integration test adapted to run with my tiny app runs into the same problem I've
been having with this app: 

[ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value:
{66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0}));
not retrying
[ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:

...

Any ideas?

On 15 October 2017 at 19:09, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
This is probably a timing issue in the integration test, which has to wait for `pio deploy`
to finish before the queries can be made. If it doesn’t finish the queries will fail. By
the time the rest of the test quits the model has been deployed so you can run queries. In
the integration-test script increase the delay after `pio deploy…` and see if it passes
then.

This is probably an integrtion-test script problem not a problem in the system



On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <noses@vicomtech.org <mailto:noses@vicomtech.org>>
wrote:

Pat,

I have run the integration test for the handmade example out of curiosity. Strangely enough
things go more or less as expected apart from the fact that I get a message saying:

...
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
Model will remain deployed after this test
Waiting 30 seconds for the server to start
nohup: redirecting stderr to stdout
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed
to connect to localhost port 8000: Connection refused

So the integration test does not manage to get the recommendations even though the model trained
and deployed successfully. However, as soon as the integration test finishes, on the same
terminal, I can get the recommendations by doing the following:

$ curl -H "Content-Type: application/json" -d '
> {
>     "user": "u1"
> }' http://localhost:8000/queries.json <http://localhost:8000/queries.json>
{"itemScores":[{"item":"Nexus","score":0.057719700038433075},{"item":"Surface","score":0.0}]}

Isn't this odd? Can you guess what's going on?

Thank you very much for all your support!
noelia



On 5 October 2017 at 19:22, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
Ok, that config should work. Does the integration test pass?

The data you are using is extremely small and though it does look like it has cooccurrences,
they may not meet minimum “big-data” thresholds used by default. Try adding more data
or use the handmade example data, rename purchase to view and discard the existing view data
if you wish.

The error is very odd and I’ve never seen it. If the integration test works I can only surmise
it's your data.


On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <noses@vicomtech.org <mailto:noses@vicomtech.org>>
wrote:

SPARK: spark-1.6.3-bin-hadoop2.6

PIO: 0.11.0-incubating

Scala: whatever gets installed when installing PIO 0.11.0-incubating, I haven't installed
Scala separately

UR: ActionML's UR v0.6.0 I suppose as that's the last version mentioned in the readme file.
I have attached the UR zip file I downloaded from the actionml github account.

Thank you for your help!!

On 4 October 2017 at 17:20, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:
What version of Scala. Spark, PIO, and UR are you using?


On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <noses@vicomtech.org <mailto:noses@vicomtech.org>>
wrote:

Hi all,

I'm still trying to create a very simple app to learn to use PredictionIO and still having
trouble. I have done pio build no problem. But when I do pio train I get a very long error
message related to serialisation (error message copied below).

pio status reports system is all ready to go.

The app I'm trying to build is very simple, it only has 'view' events. Here's the engine.json:

===========================================================
{
  "comment":" This config file uses default settings for all but the required values see README.md
for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "tiny_app_data.csv",
      "appName": "TinyApp",
      "eventNames": ["view"]
    }
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill,
must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "TinyApp",
        "indexName": "urindex",
        "typeName": "items",
        "comment": "must have data for the first event or the model will not build, other
events are optional",
        "eventNames": ["view"]
      }
    }
  ]
}
===========================================================

The data I'm using is:

"u1","i1"
"u2","i1"
"u2","i2"
"u3","i2"
"u3","i3"
"u4","i4"

meaning user u viewed item i.

The data has been added to the database with the following python code:

===========================================================
"""
Import sample data for recommendation engine
"""

import predictionio
import argparse
import random

RATE_ACTIONS_DELIMITER = ","
SEED = 1


def import_events(client, file):
  f = open(file, 'r')
  random.seed(SEED)
  count = 0
  print "Importing data..."

  items = []
  users = []
  f = open(file, 'r')
  for line in f:
    data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
    users.append(data[0])
    items.append(data[1])
    client.create_event(
      event="view",
      entity_type="user",
      entity_id=data[0],
      target_entity_type="item",
      target_entity_id=data[1]
    )
    print "Event: " + "view" + " entity_id: " + data[0] + " target_entity_id: " + data[1]
    count += 1
  f.close()

  users = set(users)
  items = set(items)
  print "All users: " + str(users)
  print "All items: " + str(items)
  for item in items:
    client.create_event(
      event="$set",
      entity_type="item",
      entity_id=item
    )
    count += 1


  print "%s events are imported." % count


if __name__ == '__main__':
  parser = argparse.ArgumentParser(
    description="Import sample data for recommendation engine")
  parser.add_argument('--access_key', default='invald_access_key')
  parser.add_argument('--url', default="http://localhost:7070 <http://localhost:7070/>")
  parser.add_argument('--file', default="./data/tiny_app_data.csv")

  args = parser.parse_args()
  print args

  client = predictionio.EventClient(
    access_key=args.access_key,
    url=args.url,
    threads=5,
    qsize=500)
  import_events(client, args.file)
===========================================================

My pio_env.sh is the following:

===========================================================
#!/usr/bin/env bash
#
# Copy this file as pio-env.sh and edit it for your site's configuration.
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0 <http://www.apache.org/licenses/LICENSE-2.0>
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.

# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6

POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar

# ES_CONF_DIR: You must configure this if you have advanced configuration for
#              your Elasticsearch setup.
# ES_CONF_DIR=/opt/elasticsearch
#ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6

# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
#                  with Hadoop 2.
# HADOOP_CONF_DIR=/opt/hadoop

# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
#                 with HBase on a remote cluster.
# HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.incubator.apache.org/system/anotherdatastore/ <http://predictionio.incubator.apache.org/system/anotherdatastore/>

# Storage Repositories

# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

# Storage Data Sources

# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio <>
PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio

# MySQL Example
# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio <>
# PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio

# Elasticsearch Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
# PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.2.1
# Elasticsearch 1.x Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.7.6

# Local File System Example
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models

# HBase Example
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
===========================================================

Error message:

===========================================================
[ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value:
{3:1.0,2:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying
[ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value:
{0:1.0,3:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying
[ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value:
{1:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (1,{1:1.0})); not retrying
[ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value:
{0:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (0,{0:1.0})); not retrying
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure:
Task 2.0 in stage 10.0 (TID 24) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value:
{3:1.0,2:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
    at org.apache.spark.scheduler.DAGScheduler.org <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
    at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
    at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com <http://s.drm.checkpointeddrmspark.com/>puteNRow(CheckpointedDrmSpark.scala:188)
    at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow$lzycompute(CheckpointedDrmSpark.scala:55)
    at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow(CheckpointedDrmSpark.scala:55)
    at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.newRowCardinality(CheckpointedDrmSpark.scala:219)
    at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213)
    at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71)
    at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at com.actionml.Preparator.prepare(Preparator.scala:49)
    at com.actionml.Preparator.prepare(Preparator.scala:32)
    at org.apache.predictionio.controller.PPreparator.prepareBase(PPreparator.scala:37)
    at org.apache.predictionio.controller.Engine$.train(Engine.scala:671)
    at org.apache.predictionio.controller.Engine.train(Engine.scala:177)
    at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
    at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
    at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
===========================================================

Thank you all for your help.

Best regards,
noelia










-- 
You received this message because you are subscribed to the Google Groups "actionml-user"
group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-user+unsubscribe@googlegroups.com
<mailto:actionml-user+unsubscribe@googlegroups.com>.
To post to this group, send email to actionml-user@googlegroups.com <mailto:actionml-user@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com
<https://groups.google.com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.


Mime
View raw message