predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel O' Shaughnessy" <danieljamesda...@gmail.com>
Subject Re: How to increase maxResultSize with Spark 2.1.1
Date Fri, 23 Feb 2018 17:09:51 GMT
Hi Shane,

There's an example of PAlgorithm here :

https://predictionio.apache.org/templates/vanilla/dase/



On Fri, 23 Feb 2018 at 16:10 Shane Johnson <shanewaldenjohnson@gmail.com>
wrote:

> Thanks Donald. I used this command but am still getting this error. It
> doesn't seem to be adjusting the configuration. Do you see a problem in how
> I used the spark-submit options. The train function ran but the error makes
> me think the sparkResultSize was not adjusted.
>
> command:
>
> bin/pio build --verbose; bin/pio train -- --driver-memory 14G -- --conf
> spark.driver.max
> ResultSize=4g; bin/pio deploy
>
> error:
>
> Job aborted due to stage failure: Total
> size of serialized results of 8 tasks (1236.7 MB) is bigger than
> spark.driver.maxResultSize (1024.0
> MB)
>
> Regarding the PAlgorithm, what I am trying to do is save a Map in the
> train method to reuse in the predict method. Because of the error above I
> am not able to convert my RDD to a map as the collectAsMap tries to bring
> it to the driver. If I use the PAlgorithm, I should be able to just save
> the RDD in the Model class and then use it in the predict method. I am
> going down that path now. *Do you know of any templates that are using
> the PAlgorithm?* The docs say that "Similar Product" uses it but it looks
> like it uses the P2LAlgorithm.
>
> Thank you for your help.
>
> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
> <https://www.facebook.com/shane.johnson.71653>
>
> 2018-02-22 8:16 GMT-10:00 Donald Szeto <donald@apache.org>:
>
>> Hi Shane,
>>
>> I think what you are looking for to set max result size on the driver is
>> by passing in a spark-submit argument that looks something like this:
>>
>> pio train ... -- --conf spark.driver.maxResultSize=4g ...
>>
>> Regarding PAlgorithm, the predict() method does not actually have the
>> SparkContext in it (
>> http://predictionio.apache.org/api/current/#org.apache.predictionio.controller.PAlgorithm).
>> The "model" argument, unlike P2LAlgorithm, can contain RDDs. In
>> PAlgorithm.predict(), you would be able to perform RDD operations directly
>> on the model argument. If the SparkContext is needed, the context() method
>> can be used on the model RDD.
>>
>> Hope these help.
>>
>> Regards,
>> Donald
>>
>> On Wed, Feb 21, 2018 at 12:08 PM Shane Johnson <
>> shanewaldenjohnson@gmail.com> wrote:
>>
>>> Hi team,
>>>
>>> We have a specific use case where we are trying to save off a map from
>>> the train function and reuse it in the predict function to increase our
>>> predict function response time. I know the collect() forces everything to
>>> the driver. We are collecting the RDD to a map as we don't have a spark
>>> context in the predict function.
>>>
>>> I am getting this error and am looking for a way to adjust the parameter
>>> from 1G to 4G+. I can see a way to do it in Spark 1.6 but we are using
>>> Spark 2.1.1 and I have not seen the ability to set this. *Has anyone
>>> been able to adjust the maxResultSize to something more than 1G?*
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to
stage failure: Total size of serialized results of 7 tasks (1156.3 MB) is bigger than spark.driver.maxResultSize
(1024.0 MB)
>>>
>>>
>>> I have tried to set this parameter but get this as a result with Spark
>>> 2.1.1
>>>
>>> Error: Unrecognized option: --driver-maxResultSize
>>>
>>> Our other option is to do the work to obtain a spark context in the
>>> predict function so we can pass the RDD through from the train to predict
>>> function. The documentation was a little unclear to me on PredictionIO. *Is
>>> this the right place to learn how to get a spark context in the predict
>>> function?*
>>> https://predictionio.incubator.apache.org/templates/vanilla/dase/
>>>
>>> Also I am not seeing in this documentation how to get the spark context
>>> into the predict function, it looks like it is only used in the train
>>> function.
>>>
>>> Thanks in advance for your expertise.
>>>
>>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>>> <https://www.facebook.com/shane.johnson.71653>
>>>
>>
>

Mime
View raw message