predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shane Johnson <shanewaldenjohn...@gmail.com>
Subject Re: How to increase maxResultSize with Spark 2.1.1
Date Fri, 23 Feb 2018 19:00:08 GMT
Thanks Daniel, this example is very helpful. That looks very
straightforward and promising. This will be great to have in place instead
of forcing everything into the driver through a collectAsMap.

*Shane Johnson | 801.360.3350*
LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
<https://www.facebook.com/shane.johnson.71653>

2018-02-23 7:24 GMT-10:00 Daniel O' Shaughnessy <danieljamesdavid@gmail.com>
:

> The RDD will just be attached to the model so you simply reference it.
>
> Here's an example where I also transform the RDD into a DataFrame in the
> predict method.
>
> val sqlContext = SQLContext.getOrCreate(model.sc)
> import sqlContext.implicits._
> val dfWithSchema = sqlContext.createDataFrame(model.dcRDD).toDF("col1",
> "col2", "colN")
>
> On Fri, 23 Feb 2018 at 17:12 Shane Johnson <shanewaldenjohnson@gmail.com>
> wrote:
>
>> Thanks Daniel,
>>
>> This is the one I am using now. I was hoping to see an example of how the
>> RDD was used in the predict() method as this example stops at the train().
>> I will follow the steps here and see if I am able to pull in the saved RDD
>> from the train() to the predict() method.
>>
>> Best,
>>
>> Shane
>>
>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>> <https://www.facebook.com/shane.johnson.71653>
>>
>> 2018-02-23 7:09 GMT-10:00 Daniel O' Shaughnessy <
>> danieljamesdavid@gmail.com>:
>>
>>> Hi Shane,
>>>
>>> There's an example of PAlgorithm here :
>>>
>>> https://predictionio.apache.org/templates/vanilla/dase/
>>>
>>>
>>>
>>> On Fri, 23 Feb 2018 at 16:10 Shane Johnson <shanewaldenjohnson@gmail.com>
>>> wrote:
>>>
>>>> Thanks Donald. I used this command but am still getting this error. It
>>>> doesn't seem to be adjusting the configuration. Do you see a problem in how
>>>> I used the spark-submit options. The train function ran but the error makes
>>>> me think the sparkResultSize was not adjusted.
>>>>
>>>> command:
>>>>
>>>> bin/pio build --verbose; bin/pio train -- --driver-memory 14G -- --conf
>>>> spark.driver.max
>>>> ResultSize=4g; bin/pio deploy
>>>>
>>>> error:
>>>>
>>>> Job aborted due to stage failure: Total
>>>> size of serialized results of 8 tasks (1236.7 MB) is bigger than
>>>> spark.driver.maxResultSize (1024.0
>>>> MB)
>>>>
>>>> Regarding the PAlgorithm, what I am trying to do is save a Map in the
>>>> train method to reuse in the predict method. Because of the error above I
>>>> am not able to convert my RDD to a map as the collectAsMap tries to bring
>>>> it to the driver. If I use the PAlgorithm, I should be able to just save
>>>> the RDD in the Model class and then use it in the predict method. I am
>>>> going down that path now. *Do you know of any templates that are using
>>>> the PAlgorithm?* The docs say that "Similar Product" uses it but it
>>>> looks like it uses the P2LAlgorithm.
>>>>
>>>> Thank you for your help.
>>>>
>>>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>>>> <https://www.facebook.com/shane.johnson.71653>
>>>>
>>>> 2018-02-22 8:16 GMT-10:00 Donald Szeto <donald@apache.org>:
>>>>
>>>>> Hi Shane,
>>>>>
>>>>> I think what you are looking for to set max result size on the driver
>>>>> is by passing in a spark-submit argument that looks something like this:
>>>>>
>>>>> pio train ... -- --conf spark.driver.maxResultSize=4g ...
>>>>>
>>>>> Regarding PAlgorithm, the predict() method does not actually have the
>>>>> SparkContext in it (http://predictionio.apache.
>>>>> org/api/current/#org.apache.predictionio.controller.PAlgorithm). The
>>>>> "model" argument, unlike P2LAlgorithm, can contain RDDs. In
>>>>> PAlgorithm.predict(), you would be able to perform RDD operations directly
>>>>> on the model argument. If the SparkContext is needed, the context() method
>>>>> can be used on the model RDD.
>>>>>
>>>>> Hope these help.
>>>>>
>>>>> Regards,
>>>>> Donald
>>>>>
>>>>> On Wed, Feb 21, 2018 at 12:08 PM Shane Johnson <
>>>>> shanewaldenjohnson@gmail.com> wrote:
>>>>>
>>>>>> Hi team,
>>>>>>
>>>>>> We have a specific use case where we are trying to save off a map
>>>>>> from the train function and reuse it in the predict function to increase
>>>>>> our predict function response time. I know the collect() forces everything
>>>>>> to the driver. We are collecting the RDD to a map as we don't have
a spark
>>>>>> context in the predict function.
>>>>>>
>>>>>> I am getting this error and am looking for a way to adjust the
>>>>>> parameter from 1G to 4G+. I can see a way to do it in Spark 1.6 but
we are
>>>>>> using Spark 2.1.1 and I have not seen the ability to set this. *Has
>>>>>> anyone been able to adjust the maxResultSize to something more than
1G?*
>>>>>>
>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
due to stage failure: Total size of serialized results of 7 tasks (1156.3 MB) is bigger than
spark.driver.maxResultSize (1024.0 MB)
>>>>>>
>>>>>>
>>>>>> I have tried to set this parameter but get this as a result with
>>>>>> Spark 2.1.1
>>>>>>
>>>>>> Error: Unrecognized option: --driver-maxResultSize
>>>>>>
>>>>>> Our other option is to do the work to obtain a spark context in the
>>>>>> predict function so we can pass the RDD through from the train to
predict
>>>>>> function. The documentation was a little unclear to me on PredictionIO.
*Is
>>>>>> this the right place to learn how to get a spark context in the predict
>>>>>> function?* https://predictionio.incubator.apache.
>>>>>> org/templates/vanilla/dase/
>>>>>>
>>>>>> Also I am not seeing in this documentation how to get the spark
>>>>>> context into the predict function, it looks like it is only used
in the
>>>>>> train function.
>>>>>>
>>>>>> Thanks in advance for your expertise.
>>>>>>
>>>>>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>>>>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>>>>>> <https://www.facebook.com/shane.johnson.71653>
>>>>>>
>>>>>
>>>>
>>

Mime
View raw message