predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noelia Osés Fernández <no...@vicomtech.org>
Subject Re: [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 had a not serializable result
Date Fri, 20 Oct 2017 07:17:54 GMT
Thanks for the explanation, Pat!

I think the best course of action is for me to read the documentation and
understand how the algorithm works. Then, try again with a slightly larger
dataset.

Thank you very much!

On 19 October 2017 at 17:15, Pat Ferrel <pat@occamsmachete.com> wrote:

> This sample dataset is too small with too few cooccurrences. U1 will never
> get i1 due to the blacklist (u1 has already viewed i1 so will not be
> recommended that again). The blacklist can be disable if you want to
> recommend viewed items again but beware that they may predominate every
> recommendations set if you do tun it off since it is self-fulfilling. Why
> not i2, not sure without running the math, the UR looks at things
> statistically and with this small a dataset anomalies can be seen since the
> data is not statistically significant. I1 will show up in internal
> intermediate results (A’A for instance) but these are then filtered by a
> statistical test called LLR, which requires a certain amount of data to
> work.
>
> Notice the handmade dataset has many more cooccurrences and produces
> understandable results. Also notice that in your dataset i3 and i4 can only
> be recommended by “popularity” since they have no cooccurrence.
>
>
>
> On Oct 19, 2017, at 1:28 AM, Noelia Osés Fernández <noses@vicomtech.org>
> wrote:
>
> Pat, this worked!!!!! Thank you very much!!!!
>
> The only odd thing now is that all the results I get now are 0s. For
> example:
>
> Using the dataset:
>
> "u1","i1"
> "u2","i1"
> "u2","i2"
> "u3","i2"
> "u3","i3"
> "u4","i4"
>
> echo "Recommendations for user: u1"
> echo ""
> curl -H "Content-Type: application/json" -d '
> {
>     "user": "u1"
> }' http://localhost:8000/queries.json
> echo ""
>
> What I get is:
>
> {"itemScores":[{"item":"\"i2\"","score":0.0},{"item":"\"i1\"
> ","score":0.0},{"item":"\"i3\"","score":0.0},{"item":"\"i4\"
> ","score":0.0}]}
>
>
> If user u1 has viewed i1 and user u2 has viewed i1 and i2 then I think the
> algorithm should return a non-zore score for i2 (and possible i1, too).
>
> Even using the bigger dataset with 100 items I still get all scores 0s.
>
> So now I'm going to spend some time reading the following documentation,
> unless there is some other documentation you recommend I read first!
>
>  - [The Universal Recommender](http://actionml.com/docs/ur)
>  - [The Correlated Cross-Occurrence Algorithm](http://mahout.
> apache.org/users/algorithms/intro-cooccurrence-spark.html)
>  - [The Universal Recommender Slide Deck](http://www.slideshare.
> net/pferrel/unified-recommender-39986309)
>  - [Multi-domain predictive AI or how to make one thing predict another](
> https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-
> occurences/)
>
> Thank you very much for all your patience and help getting me to this
> point!!!
>
> Best regards,
> Noelia
>
>
> On 18 October 2017 at 18:33, Pat Ferrel <pat@occamsmachete.com> wrote:
>
>> It is the UR so Events are taken from the EventStore and converted into a
>> Mahout DistributedRowMatrix of RandomAccessSparseVectors, which are both
>> serializable. This path works fine and has for several years.
>>
>> This must be a config problem, like not using the MahoutKryoRegistrator,
>> which registers the serializers for these.
>>
>> @Noelia, you have left out the sparkConf section of the engine.json. The
>> one used in the integration test should work:
>>
>> {
>>   "comment":" This config file uses default settings for all but the
>> required values see README.md for docs",
>>   "id": "default",
>>   "description": "Default settings",
>>   "engineFactory": "com.actionml.RecommendationEngine",
>>   "datasource": {
>>     "params" : {
>>       "name": "tiny_app_data.csv",
>>       "appName": "TinyApp",
>>       "eventNames": ["view"]
>>     }
>>   },
>>   "sparkConf": { <================= THIS WAS LEFT OUT IN YOUR
>> ENGINE.JSON BELOW IN THIS THREAD
>>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io
>> .MahoutKryoRegistrator",
>>     "spark.kryo.referenceTracking": "false",
>>     "spark.kryoserializer.buffer": "300m",
>>     "es.index.auto.create": "true"
>>   },
>>   "algorithms": [
>>     {
>>       "comment": "simplest setup where all values are default,
>> popularity based backfill, must add eventsNames",
>>       "name": "ur",
>>       "params": {
>>         "appName": "TinyApp",
>>         "indexName": "urindex",
>>         "typeName": "items",
>>         "comment": "must have data for the first event or the model will
>> not build, other events are optional",
>>         "eventNames": ["view"]
>>       }
>>     }
>>   ]
>> }
>>
>>
>> On Oct 18, 2017, at 8:49 AM, Donald Szeto <donald@apache.org> wrote:
>>
>> Chiming in a bit. Looking at the serialization error, it looks like we
>> are just one little step away from getting this to work.
>>
>> Noelia, what does your synthesized data look like? All data that is
>> processed by Spark needs to be serializable. At some point, a
>> non-serializable vector object showing in the stack is created out of your
>> synthesized data. It would be great to know what your input event looks
>> like and see where in the code path has caused this.
>>
>> Regards,
>> Donald
>>
>> On Tue, Oct 17, 2017 at 12:14 AM Noelia Osés Fernández <
>> noses@vicomtech.org> wrote:
>>
>>> Pat, you mentioned the problem could be that the data I was using was
>>> too small. So now I'm using the attached data file as the data (4 users and
>>> 100 items). But I'm still getting the same error. I'm sorry I forgot to
>>> mention I had increased the dataset.
>>>
>>> The reason why I want to make it work with a very small dataset is
>>> because I want to be able to follow the calculations. I want to understand
>>> what the UR is doing and understand the impact of changing this or that,
>>> here or there... I find that easier to achieve with a small example in
>>> which I know exactly what's happening. I want to build my trust on my
>>> understanding of the UR before I move on to applying it to a real problem.
>>> If I'm not confident that I know how to use it, how can I tell my client
>>> that the results I'm getting are good with any degree of confidence?
>>>
>>>
>>>
>>>
>>>
>>> On 16 October 2017 at 20:44, Pat Ferrel <pat@occamsmachete.com> wrote:
>>>
>>>> So all setup is the same for the integration-test and your modified
>>>> test *except the data*?
>>>>
>>>> The error looks like a setup problem because the serialization should
>>>> happen with either test. But if the only difference really is the data,
>>>> then toss it and use either real data or the integration test data, why are
>>>> you trying to synthesize fake data if it causes the error?
>>>>
>>>> BTW the data you include below in this thread would never create
>>>> internal IDs as high as 94 in the vector. You must have switched to a new
>>>> dataset???
>>>>
>>>> I would get a dump of your data using `pio export` and make sure it’s
>>>> what you thought it was. You claim to have only 4 user ids and 4 item ids
>>>> but the serialized vector thinks you have at least 94 of user or item ids.
>>>> Something doesn’t add up.
>>>>
>>>>
>>>> On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <noses@vicomtech.org>
>>>> wrote:
>>>>
>>>> Pat, you are absolutely right! I increased the sleep time and now the
>>>> integration test for handmade works perfectly.
>>>>
>>>> However, the integration test adapted to run with my tiny app runs into
>>>> the same problem I've been having with this app:
>>>>
>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>> value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:1
>>>> .0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,7
>>>> 2:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0})
>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>> java.lang.Object)
>>>>     - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1.
>>>> 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,20:
>>>> 1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:1.0,
>>>> 46:1.0,81:1.0,86:1.0,43:1.0})); not retrying
>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>
>>>> ...
>>>>
>>>> Any ideas?
>>>>
>>>> On 15 October 2017 at 19:09, Pat Ferrel <pat@occamsmachete.com> wrote:
>>>>
>>>>> This is probably a timing issue in the integration test, which has to
>>>>> wait for `pio deploy` to finish before the queries can be made. If it
>>>>> doesn’t finish the queries will fail. By the time the rest of the test
>>>>> quits the model has been deployed so you can run queries. In the
>>>>> integration-test script increase the delay after `pio deploy…` and
see if
>>>>> it passes then.
>>>>>
>>>>> This is probably an integrtion-test script problem not a problem in
>>>>> the system
>>>>>
>>>>>
>>>>>
>>>>> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <noses@vicomtech.org>
>>>>> wrote:
>>>>>
>>>>> Pat,
>>>>>
>>>>> I have run the integration test for the handmade example out of
>>>>> curiosity. Strangely enough things go more or less as expected apart
from
>>>>> the fact that I get a message saying:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO]
>>>>> [CoreWorkflow$] Training completed successfully.Model will remain deployed
>>>>> after this testWaiting 30 seconds for the server to startnohup: redirecting
>>>>> stderr to stdout  % Total    % Received % Xferd  Average Speed   Time
>>>>> Time     Time  Current                                 Dload  Upload
>>>>> Total   Spent    Left  Speed  0     0    0     0    0     0      0  
   0
>>>>> --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost
>>>>> port 8000: Connection refused*
>>>>> So the integration test does not manage to get the recommendations
>>>>> even though the model trained and deployed successfully. However, as
soon
>>>>> as the integration test finishes, on the same terminal, I can get the
>>>>> recommendations by doing the following:
>>>>>
>>>>> $ curl -H "Content-Type: application/json" -d '
>>>>> > {
>>>>> >     "user": "u1"
>>>>> > }' http://localhost:8000/queries.json
>>>>> {"itemScores":[{"item":"Nexus","score":0.057719700038433075}
>>>>> ,{"item":"Surface","score":0.0}]}
>>>>>
>>>>> Isn't this odd? Can you guess what's going on?
>>>>>
>>>>> Thank you very much for all your support!
>>>>> noelia
>>>>>
>>>>>
>>>>>
>>>>> On 5 October 2017 at 19:22, Pat Ferrel <pat@occamsmachete.com>
wrote:
>>>>>
>>>>>> Ok, that config should work. Does the integration test pass?
>>>>>>
>>>>>> The data you are using is extremely small and though it does look
>>>>>> like it has cooccurrences, they may not meet minimum “big-data”
thresholds
>>>>>> used by default. Try adding more data or use the handmade example
data,
>>>>>> rename purchase to view and discard the existing view data if you
wish.
>>>>>>
>>>>>> The error is very odd and I’ve never seen it. If the integration
test
>>>>>> works I can only surmise it's your data.
>>>>>>
>>>>>>
>>>>>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <
>>>>>> noses@vicomtech.org> wrote:
>>>>>>
>>>>>> SPARK: spark-1.6.3-bin-hadoop2.6
>>>>>>
>>>>>> PIO: 0.11.0-incubating
>>>>>>
>>>>>> Scala: whatever gets installed when installing PIO 0.11.0-incubating,
>>>>>> I haven't installed Scala separately
>>>>>>
>>>>>> UR: ActionML's UR v0.6.0 I suppose as that's the last version
>>>>>> mentioned in the readme file. I have attached the UR zip file I downloaded
>>>>>> from the actionml github account.
>>>>>>
>>>>>> Thank you for your help!!
>>>>>>
>>>>>> On 4 October 2017 at 17:20, Pat Ferrel <pat@occamsmachete.com>
wrote:
>>>>>>
>>>>>>> What version of Scala. Spark, PIO, and UR are you using?
>>>>>>>
>>>>>>>
>>>>>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <
>>>>>>> noses@vicomtech.org> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm still trying to create a very simple app to learn to use
>>>>>>> PredictionIO and still having trouble. I have done pio build
no problem.
>>>>>>> But when I do pio train I get a very long error message related
to
>>>>>>> serialisation (error message copied below).
>>>>>>>
>>>>>>> pio status reports system is all ready to go.
>>>>>>>
>>>>>>> The app I'm trying to build is very simple, it only has 'view'
>>>>>>> events. Here's the engine.json:
>>>>>>>
>>>>>>> *===========================================================*
>>>>>>> {
>>>>>>>   "comment":" This config file uses default settings for all
but
>>>>>>> the required values see README.md for docs",
>>>>>>>   "id": "default",
>>>>>>>   "description": "Default settings",
>>>>>>>   "engineFactory": "com.actionml.RecommendationEngine",
>>>>>>>   "datasource": {
>>>>>>>     "params" : {
>>>>>>>       "name": "tiny_app_data.csv",
>>>>>>>       "appName": "TinyApp",
>>>>>>>       "eventNames": ["view"]
>>>>>>>     }
>>>>>>>   },
>>>>>>>   "algorithms": [
>>>>>>>     {
>>>>>>>       "comment": "simplest setup where all values are default,
>>>>>>> popularity based backfill, must add eventsNames",
>>>>>>>       "name": "ur",
>>>>>>>       "params": {
>>>>>>>         "appName": "TinyApp",
>>>>>>>         "indexName": "urindex",
>>>>>>>         "typeName": "items",
>>>>>>>         "comment": "must have data for the first event or the
model
>>>>>>> will not build, other events are optional",
>>>>>>>         "eventNames": ["view"]
>>>>>>>       }
>>>>>>>     }
>>>>>>>   ]
>>>>>>> }
>>>>>>> *===========================================================*
>>>>>>>
>>>>>>> The data I'm using is:
>>>>>>>
>>>>>>> "u1","i1"
>>>>>>> "u2","i1"
>>>>>>> "u2","i2"
>>>>>>> "u3","i2"
>>>>>>> "u3","i3"
>>>>>>> "u4","i4"
>>>>>>>
>>>>>>> meaning user u viewed item i.
>>>>>>>
>>>>>>> The data has been added to the database with the following python
>>>>>>> code:
>>>>>>>
>>>>>>> *===========================================================*
>>>>>>> """
>>>>>>> Import sample data for recommendation engine
>>>>>>> """
>>>>>>>
>>>>>>> import predictionio
>>>>>>> import argparse
>>>>>>> import random
>>>>>>>
>>>>>>> RATE_ACTIONS_DELIMITER = ","
>>>>>>> SEED = 1
>>>>>>>
>>>>>>>
>>>>>>> def import_events(client, file):
>>>>>>>   f = open(file, 'r')
>>>>>>>   random.seed(SEED)
>>>>>>>   count = 0
>>>>>>>   print "Importing data..."
>>>>>>>
>>>>>>>   items = []
>>>>>>>   users = []
>>>>>>>   f = open(file, 'r')
>>>>>>>   for line in f:
>>>>>>>     data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
>>>>>>>     users.append(data[0])
>>>>>>>     items.append(data[1])
>>>>>>>     client.create_event(
>>>>>>>       event="view",
>>>>>>>       entity_type="user",
>>>>>>>       entity_id=data[0],
>>>>>>>       target_entity_type="item",
>>>>>>>       target_entity_id=data[1]
>>>>>>>     )
>>>>>>>     print "Event: " + "view" + " entity_id: " + data[0] + "
>>>>>>> target_entity_id: " + data[1]
>>>>>>>     count += 1
>>>>>>>   f.close()
>>>>>>>
>>>>>>>   users = set(users)
>>>>>>>   items = set(items)
>>>>>>>   print "All users: " + str(users)
>>>>>>>   print "All items: " + str(items)
>>>>>>>   for item in items:
>>>>>>>     client.create_event(
>>>>>>>       event="$set",
>>>>>>>       entity_type="item",
>>>>>>>       entity_id=item
>>>>>>>     )
>>>>>>>     count += 1
>>>>>>>
>>>>>>>
>>>>>>>   print "%s events are imported." % count
>>>>>>>
>>>>>>>
>>>>>>> if __name__ == '__main__':
>>>>>>>   parser = argparse.ArgumentParser(
>>>>>>>     description="Import sample data for recommendation engine")
>>>>>>>   parser.add_argument('--access_key', default='invald_access_key')
>>>>>>>   parser.add_argument('--url', default="http://localhost:7070")
>>>>>>>   parser.add_argument('--file', default="./data/tiny_app_data.csv")
>>>>>>>
>>>>>>>   args = parser.parse_args()
>>>>>>>   print args
>>>>>>>
>>>>>>>   client = predictionio.EventClient(
>>>>>>>     access_key=args.access_key,
>>>>>>>     url=args.url,
>>>>>>>     threads=5,
>>>>>>>     qsize=500)
>>>>>>>   import_events(client, args.file)
>>>>>>> *===========================================================*
>>>>>>>
>>>>>>> My pio_env.sh is the following:
>>>>>>>
>>>>>>> *===========================================================*
>>>>>>> #!/usr/bin/env bash
>>>>>>> #
>>>>>>> # Copy this file as pio-env.sh and edit it for your site's
>>>>>>> configuration.
>>>>>>> #
>>>>>>> # Licensed to the Apache Software Foundation (ASF) under one
or more
>>>>>>> # contributor license agreements.  See the NOTICE file distributed
>>>>>>> with
>>>>>>> # this work for additional information regarding copyright ownership.
>>>>>>> # The ASF licenses this file to You under the Apache License,
>>>>>>> Version 2.0
>>>>>>> # (the "License"); you may not use this file except in compliance
>>>>>>> with
>>>>>>> # the License.  You may obtain a copy of the License at
>>>>>>> #
>>>>>>> #    http://www.apache.org/licenses/LICENSE-2.0
>>>>>>> #
>>>>>>> # Unless required by applicable law or agreed to in writing,
software
>>>>>>> # distributed under the License is distributed on an "AS IS"
BASIS,
>>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or
>>>>>>> implied.
>>>>>>> # See the License for the specific language governing permissions
and
>>>>>>> # limitations under the License.
>>>>>>> #
>>>>>>>
>>>>>>> # PredictionIO Main Configuration
>>>>>>> #
>>>>>>> # This section controls core behavior of PredictionIO. It is
very
>>>>>>> likely that
>>>>>>> # you need to change these to fit your site.
>>>>>>>
>>>>>>> # SPARK_HOME: Apache Spark is a hard dependency and must be
>>>>>>> configured.
>>>>>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
>>>>>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6
>>>>>>>
>>>>>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
>>>>>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
>>>>>>>
>>>>>>> # ES_CONF_DIR: You must configure this if you have advanced
>>>>>>> configuration for
>>>>>>> #              your Elasticsearch setup.
>>>>>>> # ES_CONF_DIR=/opt/elasticsearch
>>>>>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6
>>>>>>>
>>>>>>> # HADOOP_CONF_DIR: You must configure this if you intend to run
>>>>>>> PredictionIO
>>>>>>> #                  with Hadoop 2.
>>>>>>> # HADOOP_CONF_DIR=/opt/hadoop
>>>>>>>
>>>>>>> # HBASE_CONF_DIR: You must configure this if you intend to run
>>>>>>> PredictionIO
>>>>>>> #                 with HBase on a remote cluster.
>>>>>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
>>>>>>>
>>>>>>> # Filesystem paths where PredictionIO uses as block storage.
>>>>>>> PIO_FS_BASEDIR=$HOME/.pio_store
>>>>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>>>>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>>>>>>
>>>>>>> # PredictionIO Storage Configuration
>>>>>>> #
>>>>>>> # This section controls programs that make use of PredictionIO's
>>>>>>> built-in
>>>>>>> # storage facilities. Default values are shown below.
>>>>>>> #
>>>>>>> # For more information on storage configuration please refer
to
>>>>>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/
>>>>>>>
>>>>>>> # Storage Repositories
>>>>>>>
>>>>>>> # Default is to use PostgreSQL
>>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>>>>>>
>>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>>>>>>
>>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>>>>>>
>>>>>>> # Storage Data Sources
>>>>>>>
>>>>>>> # PostgreSQL Default Settings
>>>>>>> # Please change "pio" to your database name in
>>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL
>>>>>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
>>>>>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
>>>>>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
>>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
>>>>>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
>>>>>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
>>>>>>>
>>>>>>> # MySQL Example
>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
>>>>>>>
>>>>>>> # Elasticsearch Example
>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/
>>>>>>> elasticsearch-5.2.1
>>>>>>> # Elasticsearch 1.x Example
>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/
>>>>>>> elasticsearch-1.7.6
>>>>>>>
>>>>>>> # Local File System Example
>>>>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>>>>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>>>>>>
>>>>>>> # HBase Example
>>>>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>>>>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
>>>>>>>
>>>>>>>
>>>>>>> *===========================================================Error
>>>>>>> message:*
>>>>>>>
>>>>>>> *===========================================================*
>>>>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had
a not
>>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>>> Serialization stack:
>>>>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>>>>> value: {3:1.0,2:1.0})
>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>> java.lang.Object)
>>>>>>>     - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying
>>>>>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had
a not
>>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>>> Serialization stack:
>>>>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>>>>> value: {0:1.0,3:1.0})
>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>> java.lang.Object)
>>>>>>>     - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying
>>>>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had
a not
>>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>>> Serialization stack:
>>>>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>>>>> value: {1:1.0})
>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>> java.lang.Object)
>>>>>>>     - object (class scala.Tuple2, (1,{1:1.0})); not retrying
>>>>>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had
a not
>>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>>> Serialization stack:
>>>>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>>>>> value: {0:1.0})
>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>> java.lang.Object)
>>>>>>>     - object (class scala.Tuple2, (0,{0:1.0})); not retrying
>>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>>>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24)
had a not
>>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>>> Serialization stack:
>>>>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>>>>> value: {3:1.0,2:1.0})
>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>> java.lang.Object)
>>>>>>>     - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.org
>>>>>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$
>>>>>>> spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DA
>>>>>>> GScheduler.scala:1431)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>>>>> 1.apply(DAGScheduler.scala:1419)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>>>>> 1.apply(DAGScheduler.scala:1418)
>>>>>>>     at scala.collection.mutable.ResizableArray$class.foreach(Resiza
>>>>>>> bleArray.scala:59)
>>>>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
>>>>>>> scala:47)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>>>>>>> ler.scala:1418)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>>>>> etFailed$1.apply(DAGScheduler.scala:799)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>>>>> etFailed$1.apply(DAGScheduler.scala:799)
>>>>>>>     at scala.Option.foreach(Option.scala:236)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>>>>>> DAGScheduler.scala:799)
>>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>>>>>>> Receive(DAGScheduler.scala:1640)
>>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>>>>>> ceive(DAGScheduler.scala:1599)
>>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>>>>>> ceive(DAGScheduler.scala:1588)
>>>>>>>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:
>>>>>>> 48)
>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
>>>>>>> scala:620)
>>>>>>>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>>>>>>>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
>>>>>>>     at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:
>>>>>>> 1088)
>>>>>>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>>>>> onScope.scala:150)
>>>>>>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>>>>> onScope.scala:111)
>>>>>>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>>>>>>     at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com
>>>>>>> <http://s.drm.checkpointeddrmspark.com/>
>>>>>>> puteNRow(CheckpointedDrmSpark.scala:188)
>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.
>>>>>>> nrow$lzycompute(CheckpointedDrmSpark.scala:55)
>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.
>>>>>>> nrow(CheckpointedDrmSpark.scala:55)
>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new
>>>>>>> RowCardinality(CheckpointedDrmSpark.scala:219)
>>>>>>>     at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213)
>>>>>>>     at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71)
>>>>>>>     at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49)
>>>>>>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(
>>>>>>> TraversableLike.scala:244)
>>>>>>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(
>>>>>>> TraversableLike.scala:244)
>>>>>>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>>>>>     at scala.collection.TraversableLike$class.map(TraversableLike.
>>>>>>> scala:244)
>>>>>>>     at scala.collection.AbstractTraversable.map(Traversable.scala:
>>>>>>> 105)
>>>>>>>     at com.actionml.Preparator.prepare(Preparator.scala:49)
>>>>>>>     at com.actionml.Preparator.prepare(Preparator.scala:32)
>>>>>>>     at org.apache.predictionio.controller.PPreparator.prepareBase(
>>>>>>> PPreparator.scala:37)
>>>>>>>     at org.apache.predictionio.controller.Engine$.train(Engine.
>>>>>>> scala:671)
>>>>>>>     at org.apache.predictionio.controller.Engine.train(Engine.
>>>>>>> scala:177)
>>>>>>>     at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(
>>>>>>> CoreWorkflow.scala:67)
>>>>>>>     at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>>>>>> Workflow.scala:250)
>>>>>>>     at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>>>>>> orkflow.scala)
>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>>>>> ssorImpl.java:62)
>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>>>>> thodAccessorImpl.java:43)
>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>>>>>> $SparkSubmit$$runMain(SparkSubmit.scala:731)
>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>>>>>> .scala:181)
>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>>>>>>> scala:206)
>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>>>>>> 121)
>>>>>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>>>
>>>>>>> *===========================================================*
>>>>>>> Thank you all for your help.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> noelia
>>>>>>>
>>>>>>>
>>>>>>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "actionml-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to actionml-user+unsubscribe@googlegroups.com.
> To post to this group, send email to actionml-user@googlegroups.com.
> To view this discussion on the web visit https://groups.google.
> com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_
> xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "actionml-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to actionml-user+unsubscribe@googlegroups.com.
> To post to this group, send email to actionml-user@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/actionml-user/ACE11A1B-C887-41F1-820B-3B161EDCDABA%
> 40occamsmachete.com
> <https://groups.google.com/d/msgid/actionml-user/ACE11A1B-C887-41F1-820B-3B161EDCDABA%40occamsmachete.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>


-- 
<http://www.vicomtech.org>

Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior

noses@vicomtech.org
+[34] 943 30 92 30
Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales

<https://www.linkedin.com/company/vicomtech>
<https://www.youtube.com/user/VICOMTech>
<https://twitter.com/@Vicomtech_IK4>

member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es>

Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>

Mime
View raw message