predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noelia Osés Fernández <no...@vicomtech.org>
Subject Re: [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 had a not serializable result
Date Mon, 23 Oct 2017 07:11:29 GMT
Thanks Pat!

On 20 October 2017 at 16:53, Pat Ferrel <pat@occamsmachete.com> wrote:

> There are several algorithm resources.
> A Math heavy one here: https://www.slideshare.net/pferrel/unified-
> recommender-39986309
> A more result oriented one here: https://developer.ibm.
> com/dwblog/2017/mahout-spark-correlated-cross-occurences/
>
> The benefit of the CCO algorithm in the UR comes into play when you have
> more than just conversions (buy for ecom, view for you) For just about all
> other recommenders you really can only use one indicator of user
> preference. Several experiments, including ones I’ve done, show you cannot
> mix buys with detail-views in ecom or your results will be worse—that is
> with single event recommenders like the Spark MLlib recommenders. The UR
> uses multi-modal input so you can indeed improve results when using buys
> with detail-views. The second post actually shows how dislikes can improve
> results when you want to predict likes.
>
> In order to do this the CCO algorithm finds events that are correlated,
> but it uses a statistical method that is suspicious of 100% correlation
> since this is likely anomalous in the real world (caused by promotions,
> give-aways, other anomalous outside influences). This statistical method is
> called the log likelihood ratio.
>
> On Oct 20, 2017, at 12:17 AM, Noelia Osés Fernández <noses@vicomtech.org>
> wrote:
>
> Thanks for the explanation, Pat!
>
> I think the best course of action is for me to read the documentation and
> understand how the algorithm works. Then, try again with a slightly larger
> dataset.
>
> Thank you very much!
>
> On 19 October 2017 at 17:15, Pat Ferrel <pat@occamsmachete.com> wrote:
>
>> This sample dataset is too small with too few cooccurrences. U1 will
>> never get i1 due to the blacklist (u1 has already viewed i1 so will not be
>> recommended that again). The blacklist can be disable if you want to
>> recommend viewed items again but beware that they may predominate every
>> recommendations set if you do tun it off since it is self-fulfilling. Why
>> not i2, not sure without running the math, the UR looks at things
>> statistically and with this small a dataset anomalies can be seen since the
>> data is not statistically significant. I1 will show up in internal
>> intermediate results (A’A for instance) but these are then filtered by a
>> statistical test called LLR, which requires a certain amount of data to
>> work.
>>
>> Notice the handmade dataset has many more cooccurrences and produces
>> understandable results. Also notice that in your dataset i3 and i4 can only
>> be recommended by “popularity” since they have no cooccurrence.
>>
>>
>>
>> On Oct 19, 2017, at 1:28 AM, Noelia Osés Fernández <noses@vicomtech.org>
>> wrote:
>>
>> Pat, this worked!!!!! Thank you very much!!!!
>>
>> The only odd thing now is that all the results I get now are 0s. For
>> example:
>>
>> Using the dataset:
>>
>> "u1","i1"
>> "u2","i1"
>> "u2","i2"
>> "u3","i2"
>> "u3","i3"
>> "u4","i4"
>>
>> echo "Recommendations for user: u1"
>> echo ""
>> curl -H "Content-Type: application/json" -d '
>> {
>>     "user": "u1"
>> }' http://localhost:8000/queries.json
>> echo ""
>>
>> What I get is:
>>
>> {"itemScores":[{"item":"\"i2\"","score":0.0},{"item":"\"i1\"
>> ","score":0.0},{"item":"\"i3\"","score":0.0},{"item":"\"i4\"
>> ","score":0.0}]}
>>
>>
>> If user u1 has viewed i1 and user u2 has viewed i1 and i2 then I think
>> the algorithm should return a non-zore score for i2 (and possible i1, too).
>>
>> Even using the bigger dataset with 100 items I still get all scores 0s.
>>
>> So now I'm going to spend some time reading the following documentation,
>> unless there is some other documentation you recommend I read first!
>>
>>  - [The Universal Recommender](http://actionml.com/docs/ur)
>>  - [The Correlated Cross-Occurrence Algorithm](http://mahout.apach
>> e.org/users/algorithms/intro-cooccurrence-spark.html)
>>  - [The Universal Recommender Slide Deck](http://www.slideshare.ne
>> t/pferrel/unified-recommender-39986309)
>>  - [Multi-domain predictive AI or how to make one thing predict another](
>> https://developer.ibm.com/dwblog/2017/mahout-spark-
>> correlated-cross-occurences/)
>>
>> Thank you very much for all your patience and help getting me to this
>> point!!!
>>
>> Best regards,
>> Noelia
>>
>>
>> On 18 October 2017 at 18:33, Pat Ferrel <pat@occamsmachete.com> wrote:
>>
>>> It is the UR so Events are taken from the EventStore and converted into
>>> a Mahout DistributedRowMatrix of RandomAccessSparseVectors, which are both
>>> serializable. This path works fine and has for several years.
>>>
>>> This must be a config problem, like not using the MahoutKryoRegistrator,
>>> which registers the serializers for these.
>>>
>>> @Noelia, you have left out the sparkConf section of the engine.json. The
>>> one used in the integration test should work:
>>>
>>> {
>>>   "comment":" This config file uses default settings for all but the
>>> required values see README.md for docs",
>>>   "id": "default",
>>>   "description": "Default settings",
>>>   "engineFactory": "com.actionml.RecommendationEngine",
>>>   "datasource": {
>>>     "params" : {
>>>       "name": "tiny_app_data.csv",
>>>       "appName": "TinyApp",
>>>       "eventNames": ["view"]
>>>     }
>>>   },
>>>   "sparkConf": { <================= THIS WAS LEFT OUT IN YOUR
>>> ENGINE.JSON BELOW IN THIS THREAD
>>>     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
>>>     "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io
>>> .MahoutKryoRegistrator",
>>>     "spark.kryo.referenceTracking": "false",
>>>     "spark.kryoserializer.buffer": "300m",
>>>     "es.index.auto.create": "true"
>>>   },
>>>   "algorithms": [
>>>     {
>>>       "comment": "simplest setup where all values are default,
>>> popularity based backfill, must add eventsNames",
>>>       "name": "ur",
>>>       "params": {
>>>         "appName": "TinyApp",
>>>         "indexName": "urindex",
>>>         "typeName": "items",
>>>         "comment": "must have data for the first event or the model
>>> will not build, other events are optional",
>>>         "eventNames": ["view"]
>>>       }
>>>     }
>>>   ]
>>> }
>>>
>>>
>>> On Oct 18, 2017, at 8:49 AM, Donald Szeto <donald@apache.org> wrote:
>>>
>>> Chiming in a bit. Looking at the serialization error, it looks like we
>>> are just one little step away from getting this to work.
>>>
>>> Noelia, what does your synthesized data look like? All data that is
>>> processed by Spark needs to be serializable. At some point, a
>>> non-serializable vector object showing in the stack is created out of your
>>> synthesized data. It would be great to know what your input event looks
>>> like and see where in the code path has caused this.
>>>
>>> Regards,
>>> Donald
>>>
>>> On Tue, Oct 17, 2017 at 12:14 AM Noelia Osés Fernández <
>>> noses@vicomtech.org> wrote:
>>>
>>>> Pat, you mentioned the problem could be that the data I was using was
>>>> too small. So now I'm using the attached data file as the data (4 users and
>>>> 100 items). But I'm still getting the same error. I'm sorry I forgot to
>>>> mention I had increased the dataset.
>>>>
>>>> The reason why I want to make it work with a very small dataset is
>>>> because I want to be able to follow the calculations. I want to understand
>>>> what the UR is doing and understand the impact of changing this or that,
>>>> here or there... I find that easier to achieve with a small example in
>>>> which I know exactly what's happening. I want to build my trust on my
>>>> understanding of the UR before I move on to applying it to a real problem.
>>>> If I'm not confident that I know how to use it, how can I tell my client
>>>> that the results I'm getting are good with any degree of confidence?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 16 October 2017 at 20:44, Pat Ferrel <pat@occamsmachete.com> wrote:
>>>>
>>>>> So all setup is the same for the integration-test and your modified
>>>>> test *except the data*?
>>>>>
>>>>> The error looks like a setup problem because the serialization should
>>>>> happen with either test. But if the only difference really is the data,
>>>>> then toss it and use either real data or the integration test data, why
are
>>>>> you trying to synthesize fake data if it causes the error?
>>>>>
>>>>> BTW the data you include below in this thread would never create
>>>>> internal IDs as high as 94 in the vector. You must have switched to a
new
>>>>> dataset???
>>>>>
>>>>> I would get a dump of your data using `pio export` and make sure it’s
>>>>> what you thought it was. You claim to have only 4 user ids and 4 item
ids
>>>>> but the serialized vector thinks you have at least 94 of user or item
ids.
>>>>> Something doesn’t add up.
>>>>>
>>>>>
>>>>> On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <
>>>>> noses@vicomtech.org> wrote:
>>>>>
>>>>> Pat, you are absolutely right! I increased the sleep time and now the
>>>>> integration test for handmade works perfectly.
>>>>>
>>>>> However, the integration test adapted to run with my tiny app runs
>>>>> into the same problem I've been having with this app:
>>>>>
>>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>> Serialization stack:
>>>>>     - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector,
>>>>> value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:1
>>>>> .0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,7
>>>>> 2:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0})
>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>> java.lang.Object)
>>>>>     - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1.
>>>>> 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,20:
>>>>> 1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0}));
>>>>> not retrying
>>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>> Serialization stack:
>>>>>
>>>>> ...
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> On 15 October 2017 at 19:09, Pat Ferrel <pat@occamsmachete.com>
wrote:
>>>>>
>>>>>> This is probably a timing issue in the integration test, which has
to
>>>>>> wait for `pio deploy` to finish before the queries can be made. If
it
>>>>>> doesn’t finish the queries will fail. By the time the rest of the
test
>>>>>> quits the model has been deployed so you can run queries. In the
>>>>>> integration-test script increase the delay after `pio deploy…`
and see if
>>>>>> it passes then.
>>>>>>
>>>>>> This is probably an integrtion-test script problem not a problem
in
>>>>>> the system
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <
>>>>>> noses@vicomtech.org> wrote:
>>>>>>
>>>>>> Pat,
>>>>>>
>>>>>> I have run the integration test for the handmade example out of
>>>>>> curiosity. Strangely enough things go more or less as expected apart
from
>>>>>> the fact that I get a message saying:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO]
>>>>>> [CoreWorkflow$] Training completed successfully.Model will remain
deployed
>>>>>> after this testWaiting 30 seconds for the server to startnohup: redirecting
>>>>>> stderr to stdout  % Total    % Received % Xferd  Average Speed  
Time
>>>>>> Time     Time  Current                                 Dload  Upload
>>>>>> Total   Spent    Left  Speed  0     0    0     0    0     0     
0      0
>>>>>> --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost
>>>>>> port 8000: Connection refused*
>>>>>> So the integration test does not manage to get the recommendations
>>>>>> even though the model trained and deployed successfully. However,
as soon
>>>>>> as the integration test finishes, on the same terminal, I can get
the
>>>>>> recommendations by doing the following:
>>>>>>
>>>>>> $ curl -H "Content-Type: application/json" -d '
>>>>>> > {
>>>>>> >     "user": "u1"
>>>>>> > }' http://localhost:8000/queries.json
>>>>>> {"itemScores":[{"item":"Nexus","score":0.057719700038433075}
>>>>>> ,{"item":"Surface","score":0.0}]}
>>>>>>
>>>>>> Isn't this odd? Can you guess what's going on?
>>>>>>
>>>>>> Thank you very much for all your support!
>>>>>> noelia
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5 October 2017 at 19:22, Pat Ferrel <pat@occamsmachete.com>
wrote:
>>>>>>
>>>>>>> Ok, that config should work. Does the integration test pass?
>>>>>>>
>>>>>>> The data you are using is extremely small and though it does
look
>>>>>>> like it has cooccurrences, they may not meet minimum “big-data”
thresholds
>>>>>>> used by default. Try adding more data or use the handmade example
data,
>>>>>>> rename purchase to view and discard the existing view data if
you wish.
>>>>>>>
>>>>>>> The error is very odd and I’ve never seen it. If the integration
>>>>>>> test works I can only surmise it's your data.
>>>>>>>
>>>>>>>
>>>>>>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <
>>>>>>> noses@vicomtech.org> wrote:
>>>>>>>
>>>>>>> SPARK: spark-1.6.3-bin-hadoop2.6
>>>>>>>
>>>>>>> PIO: 0.11.0-incubating
>>>>>>>
>>>>>>> Scala: whatever gets installed when installing PIO
>>>>>>> 0.11.0-incubating, I haven't installed Scala separately
>>>>>>>
>>>>>>> UR: ActionML's UR v0.6.0 I suppose as that's the last version
>>>>>>> mentioned in the readme file. I have attached the UR zip file
I downloaded
>>>>>>> from the actionml github account.
>>>>>>>
>>>>>>> Thank you for your help!!
>>>>>>>
>>>>>>> On 4 October 2017 at 17:20, Pat Ferrel <pat@occamsmachete.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What version of Scala. Spark, PIO, and UR are you using?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <
>>>>>>>> noses@vicomtech.org> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm still trying to create a very simple app to learn to
use
>>>>>>>> PredictionIO and still having trouble. I have done pio build
no problem.
>>>>>>>> But when I do pio train I get a very long error message related
to
>>>>>>>> serialisation (error message copied below).
>>>>>>>>
>>>>>>>> pio status reports system is all ready to go.
>>>>>>>>
>>>>>>>> The app I'm trying to build is very simple, it only has 'view'
>>>>>>>> events. Here's the engine.json:
>>>>>>>>
>>>>>>>> *===========================================================*
>>>>>>>> {
>>>>>>>>   "comment":" This config file uses default settings for
all but
>>>>>>>> the required values see README.md for docs",
>>>>>>>>   "id": "default",
>>>>>>>>   "description": "Default settings",
>>>>>>>>   "engineFactory": "com.actionml.RecommendationEngine",
>>>>>>>>   "datasource": {
>>>>>>>>     "params" : {
>>>>>>>>       "name": "tiny_app_data.csv",
>>>>>>>>       "appName": "TinyApp",
>>>>>>>>       "eventNames": ["view"]
>>>>>>>>     }
>>>>>>>>   },
>>>>>>>>   "algorithms": [
>>>>>>>>     {
>>>>>>>>       "comment": "simplest setup where all values are default,
>>>>>>>> popularity based backfill, must add eventsNames",
>>>>>>>>       "name": "ur",
>>>>>>>>       "params": {
>>>>>>>>         "appName": "TinyApp",
>>>>>>>>         "indexName": "urindex",
>>>>>>>>         "typeName": "items",
>>>>>>>>         "comment": "must have data for the first event or
the
>>>>>>>> model will not build, other events are optional",
>>>>>>>>         "eventNames": ["view"]
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>   ]
>>>>>>>> }
>>>>>>>> *===========================================================*
>>>>>>>>
>>>>>>>> The data I'm using is:
>>>>>>>>
>>>>>>>> "u1","i1"
>>>>>>>> "u2","i1"
>>>>>>>> "u2","i2"
>>>>>>>> "u3","i2"
>>>>>>>> "u3","i3"
>>>>>>>> "u4","i4"
>>>>>>>>
>>>>>>>> meaning user u viewed item i.
>>>>>>>>
>>>>>>>> The data has been added to the database with the following
python
>>>>>>>> code:
>>>>>>>>
>>>>>>>> *===========================================================*
>>>>>>>> """
>>>>>>>> Import sample data for recommendation engine
>>>>>>>> """
>>>>>>>>
>>>>>>>> import predictionio
>>>>>>>> import argparse
>>>>>>>> import random
>>>>>>>>
>>>>>>>> RATE_ACTIONS_DELIMITER = ","
>>>>>>>> SEED = 1
>>>>>>>>
>>>>>>>>
>>>>>>>> def import_events(client, file):
>>>>>>>>   f = open(file, 'r')
>>>>>>>>   random.seed(SEED)
>>>>>>>>   count = 0
>>>>>>>>   print "Importing data..."
>>>>>>>>
>>>>>>>>   items = []
>>>>>>>>   users = []
>>>>>>>>   f = open(file, 'r')
>>>>>>>>   for line in f:
>>>>>>>>     data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
>>>>>>>>     users.append(data[0])
>>>>>>>>     items.append(data[1])
>>>>>>>>     client.create_event(
>>>>>>>>       event="view",
>>>>>>>>       entity_type="user",
>>>>>>>>       entity_id=data[0],
>>>>>>>>       target_entity_type="item",
>>>>>>>>       target_entity_id=data[1]
>>>>>>>>     )
>>>>>>>>     print "Event: " + "view" + " entity_id: " + data[0] +
"
>>>>>>>> target_entity_id: " + data[1]
>>>>>>>>     count += 1
>>>>>>>>   f.close()
>>>>>>>>
>>>>>>>>   users = set(users)
>>>>>>>>   items = set(items)
>>>>>>>>   print "All users: " + str(users)
>>>>>>>>   print "All items: " + str(items)
>>>>>>>>   for item in items:
>>>>>>>>     client.create_event(
>>>>>>>>       event="$set",
>>>>>>>>       entity_type="item",
>>>>>>>>       entity_id=item
>>>>>>>>     )
>>>>>>>>     count += 1
>>>>>>>>
>>>>>>>>
>>>>>>>>   print "%s events are imported." % count
>>>>>>>>
>>>>>>>>
>>>>>>>> if __name__ == '__main__':
>>>>>>>>   parser = argparse.ArgumentParser(
>>>>>>>>     description="Import sample data for recommendation engine")
>>>>>>>>   parser.add_argument('--access_key', default='invald_access_key')
>>>>>>>>   parser.add_argument('--url', default="http://localhost:7070")
>>>>>>>>   parser.add_argument('--file', default="./data/tiny_app_data.csv")
>>>>>>>>
>>>>>>>>   args = parser.parse_args()
>>>>>>>>   print args
>>>>>>>>
>>>>>>>>   client = predictionio.EventClient(
>>>>>>>>     access_key=args.access_key,
>>>>>>>>     url=args.url,
>>>>>>>>     threads=5,
>>>>>>>>     qsize=500)
>>>>>>>>   import_events(client, args.file)
>>>>>>>> *===========================================================*
>>>>>>>>
>>>>>>>> My pio_env.sh is the following:
>>>>>>>>
>>>>>>>> *===========================================================*
>>>>>>>> #!/usr/bin/env bash
>>>>>>>> #
>>>>>>>> # Copy this file as pio-env.sh and edit it for your site's
>>>>>>>> configuration.
>>>>>>>> #
>>>>>>>> # Licensed to the Apache Software Foundation (ASF) under
one or more
>>>>>>>> # contributor license agreements.  See the NOTICE file distributed
>>>>>>>> with
>>>>>>>> # this work for additional information regarding copyright
>>>>>>>> ownership.
>>>>>>>> # The ASF licenses this file to You under the Apache License,
>>>>>>>> Version 2.0
>>>>>>>> # (the "License"); you may not use this file except in compliance
>>>>>>>> with
>>>>>>>> # the License.  You may obtain a copy of the License at
>>>>>>>> #
>>>>>>>> #    http://www.apache.org/licenses/LICENSE-2.0
>>>>>>>> #
>>>>>>>> # Unless required by applicable law or agreed to in writing,
>>>>>>>> software
>>>>>>>> # distributed under the License is distributed on an "AS
IS" BASIS,
>>>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or
>>>>>>>> implied.
>>>>>>>> # See the License for the specific language governing permissions
>>>>>>>> and
>>>>>>>> # limitations under the License.
>>>>>>>> #
>>>>>>>>
>>>>>>>> # PredictionIO Main Configuration
>>>>>>>> #
>>>>>>>> # This section controls core behavior of PredictionIO. It
is very
>>>>>>>> likely that
>>>>>>>> # you need to change these to fit your site.
>>>>>>>>
>>>>>>>> # SPARK_HOME: Apache Spark is a hard dependency and must
be
>>>>>>>> configured.
>>>>>>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
>>>>>>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6
>>>>>>>>
>>>>>>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
>>>>>>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
>>>>>>>>
>>>>>>>> # ES_CONF_DIR: You must configure this if you have advanced
>>>>>>>> configuration for
>>>>>>>> #              your Elasticsearch setup.
>>>>>>>> # ES_CONF_DIR=/opt/elasticsearch
>>>>>>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6
>>>>>>>>
>>>>>>>> # HADOOP_CONF_DIR: You must configure this if you intend
to run
>>>>>>>> PredictionIO
>>>>>>>> #                  with Hadoop 2.
>>>>>>>> # HADOOP_CONF_DIR=/opt/hadoop
>>>>>>>>
>>>>>>>> # HBASE_CONF_DIR: You must configure this if you intend to
run
>>>>>>>> PredictionIO
>>>>>>>> #                 with HBase on a remote cluster.
>>>>>>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
>>>>>>>>
>>>>>>>> # Filesystem paths where PredictionIO uses as block storage.
>>>>>>>> PIO_FS_BASEDIR=$HOME/.pio_store
>>>>>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>>>>>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>>>>>>>
>>>>>>>> # PredictionIO Storage Configuration
>>>>>>>> #
>>>>>>>> # This section controls programs that make use of PredictionIO's
>>>>>>>> built-in
>>>>>>>> # storage facilities. Default values are shown below.
>>>>>>>> #
>>>>>>>> # For more information on storage configuration please refer
to
>>>>>>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/
>>>>>>>>
>>>>>>>> # Storage Repositories
>>>>>>>>
>>>>>>>> # Default is to use PostgreSQL
>>>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>>>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>>>>>>>
>>>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>>>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>>>>>>>
>>>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>>>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>>>>>>>
>>>>>>>> # Storage Data Sources
>>>>>>>>
>>>>>>>> # PostgreSQL Default Settings
>>>>>>>> # Please change "pio" to your database name in
>>>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL
>>>>>>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
>>>>>>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
>>>>>>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
>>>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
>>>>>>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
>>>>>>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
>>>>>>>>
>>>>>>>> # MySQL Example
>>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
>>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
>>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
>>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
>>>>>>>>
>>>>>>>> # Elasticsearch Example
>>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
>>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
>>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela
>>>>>>>> sticsearch-5.2.1
>>>>>>>> # Elasticsearch 1.x Example
>>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
>>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela
>>>>>>>> sticsearch-1.7.6
>>>>>>>>
>>>>>>>> # Local File System Example
>>>>>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>>>>>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>>>>>>>
>>>>>>>> # HBase Example
>>>>>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>>>>>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
>>>>>>>>
>>>>>>>>
>>>>>>>> *===========================================================Error
>>>>>>>> message:*
>>>>>>>>
>>>>>>>> *===========================================================*
>>>>>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24)
had a not
>>>>>>>> serializable result: org.apache.mahout.math.RandomA
>>>>>>>> ccessSparseVector
>>>>>>>> Serialization stack:
>>>>>>>>     - object not serializable (class:
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value:
>>>>>>>> {3:1.0,2:1.0})
>>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>>> java.lang.Object)
>>>>>>>>     - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not
retrying
>>>>>>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25)
had a not
>>>>>>>> serializable result: org.apache.mahout.math.RandomA
>>>>>>>> ccessSparseVector
>>>>>>>> Serialization stack:
>>>>>>>>     - object not serializable (class:
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value:
>>>>>>>> {0:1.0,3:1.0})
>>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>>> java.lang.Object)
>>>>>>>>     - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not
retrying
>>>>>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23)
had a not
>>>>>>>> serializable result: org.apache.mahout.math.RandomA
>>>>>>>> ccessSparseVector
>>>>>>>> Serialization stack:
>>>>>>>>     - object not serializable (class:
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: {1:1.0})
>>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>>> java.lang.Object)
>>>>>>>>     - object (class scala.Tuple2, (1,{1:1.0})); not retrying
>>>>>>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22)
had a not
>>>>>>>> serializable result: org.apache.mahout.math.RandomA
>>>>>>>> ccessSparseVector
>>>>>>>> Serialization stack:
>>>>>>>>     - object not serializable (class:
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: {0:1.0})
>>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>>> java.lang.Object)
>>>>>>>>     - object (class scala.Tuple2, (0,{0:1.0})); not retrying
>>>>>>>> Exception in thread "main" org.apache.spark.SparkException:
Job
>>>>>>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID
24) had a not
>>>>>>>> serializable result: org.apache.mahout.math.RandomA
>>>>>>>> ccessSparseVector
>>>>>>>> Serialization stack:
>>>>>>>>     - object not serializable (class:
>>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value:
>>>>>>>> {3:1.0,2:1.0})
>>>>>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>>>>>> java.lang.Object)
>>>>>>>>     - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.org
>>>>>>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$sp
>>>>>>>> ark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGS
>>>>>>>> cheduler.scala:1431)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>>>>>> 1.apply(DAGScheduler.scala:1419)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>>>>>> 1.apply(DAGScheduler.scala:1418)
>>>>>>>>     at scala.collection.mutable.ResizableArray$class.foreach(Resiza
>>>>>>>> bleArray.scala:59)
>>>>>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.sca
>>>>>>>> la:47)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>>>>>>>> ler.scala:1418)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>>>>>> etFailed$1.apply(DAGScheduler.scala:799)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>>>>>> etFailed$1.apply(DAGScheduler.scala:799)
>>>>>>>>     at scala.Option.foreach(Option.scala:236)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>>>>>>> DAGScheduler.scala:799)
>>>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>>>>>>>> Receive(DAGScheduler.scala:1640)
>>>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>>>>>>> ceive(DAGScheduler.scala:1599)
>>>>>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>>>>>>> ceive(DAGScheduler.scala:1588)
>>>>>>>>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:
>>>>>>>> 48)
>>>>>>>>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
>>>>>>>> scala:620)
>>>>>>>>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832
>>>>>>>> )
>>>>>>>>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952
>>>>>>>> )
>>>>>>>>     at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:108
>>>>>>>> 8)
>>>>>>>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>>>>>> onScope.scala:150)
>>>>>>>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>>>>>> onScope.scala:111)
>>>>>>>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>>>>>>>     at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
>>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com
>>>>>>>> <http://s.drm.checkpointeddrmspark.com/>
>>>>>>>> puteNRow(CheckpointedDrmSpark.scala:188)
>>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro
>>>>>>>> w$lzycompute(CheckpointedDrmSpark.scala:55)
>>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro
>>>>>>>> w(CheckpointedDrmSpark.scala:55)
>>>>>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new
>>>>>>>> RowCardinality(CheckpointedDrmSpark.scala:219)
>>>>>>>>     at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213
>>>>>>>> )
>>>>>>>>     at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71
>>>>>>>> )
>>>>>>>>     at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49
>>>>>>>> )
>>>>>>>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>>>>>>> sableLike.scala:244)
>>>>>>>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>>>>>>> sableLike.scala:244)
>>>>>>>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>>>>>>     at scala.collection.TraversableLike$class.map(TraversableLike.s
>>>>>>>> cala:244)
>>>>>>>>     at scala.collection.AbstractTraversable.map(Traversable.scala:1
>>>>>>>> 05)
>>>>>>>>     at com.actionml.Preparator.prepare(Preparator.scala:49)
>>>>>>>>     at com.actionml.Preparator.prepare(Preparator.scala:32)
>>>>>>>>     at org.apache.predictionio.controller.PPreparator.prepareBase(P
>>>>>>>> Preparator.scala:37)
>>>>>>>>     at org.apache.predictionio.controller.Engine$.train(Engine.scal
>>>>>>>> a:671)
>>>>>>>>     at org.apache.predictionio.controller.Engine.train(Engine.scala
>>>>>>>> :177)
>>>>>>>>     at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>>>>>>> Workflow.scala:67)
>>>>>>>>     at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>>>>>>> Workflow.scala:250)
>>>>>>>>     at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>>>>>>> orkflow.scala)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>>>>>> ssorImpl.java:62)
>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>>>>>> thodAccessorImpl.java:43)
>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>>>>>>> $SparkSubmit$$runMain(SparkSubmit.scala:731)
>>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>>>>>>> .scala:181)
>>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>>>>>>> a:206)
>>>>>>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>>>>>>> 121)
>>>>>>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>>>>
>>>>>>>> *===========================================================*
>>>>>>>> Thank you all for your help.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> noelia
>>>>>>>>
>>>>>>>>
>>>>>>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "actionml-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to actionml-user+unsubscribe@googlegroups.com.
>> To post to this group, send email to actionml-user@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.co
>> m/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMt
>> zmStUwDsUdHCCVU-Q%40mail.gmail.com
>> <https://groups.google.com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "actionml-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to actionml-user+unsubscribe@googlegroups.com.
>> To post to this group, send email to actionml-user@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.
>> com/d/msgid/actionml-user/ACE11A1B-C887-41F1-820B-
>> 3B161EDCDABA%40occamsmachete.com
>> <https://groups.google.com/d/msgid/actionml-user/ACE11A1B-C887-41F1-820B-3B161EDCDABA%40occamsmachete.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
>
> --
> <http://www.vicomtech.org/>
>
> Noelia Osés Fernández, PhD
> Senior Researcher |
> Investigadora Senior
>
> noses@vicomtech.org
> +[34] 943 30 92 30
> Data Intelligence for Energy and
> Industrial Processes | Inteligencia
> de Datos para Energía y Procesos
> Industriales
>
> <https://www.linkedin.com/company/vicomtech>
> <https://www.youtube.com/user/VICOMTech>
> <https://twitter.com/@Vicomtech_IK4>
>
> member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es/>
>
> Legal Notice - Privacy policy
> <http://www.vicomtech.org/en/proteccion-datos>
>
> --
> You received this message because you are subscribed to the Google Groups
> "actionml-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to actionml-user+unsubscribe@googlegroups.com.
> To post to this group, send email to actionml-user@googlegroups.com.
> To view this discussion on the web visit https://groups.google.
> com/d/msgid/actionml-user/CAMysefsy0K66O9CJw-
> j3qdkN7rqoXrwHOsmgaNQTioeLuvX7Xg%40mail.gmail.com
> <https://groups.google.com/d/msgid/actionml-user/CAMysefsy0K66O9CJw-j3qdkN7rqoXrwHOsmgaNQTioeLuvX7Xg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>


-- 
<http://www.vicomtech.org>

Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior

noses@vicomtech.org
+[34] 943 30 92 30
Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales

<https://www.linkedin.com/company/vicomtech>
<https://www.youtube.com/user/VICOMTech>
<https://twitter.com/@Vicomtech_IK4>

member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es>

Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>

Mime
View raw message