spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: MatrixFactorizationModel predict(Int, Int) API
Date Fri, 07 Nov 2014 01:07:21 GMT
There is a JIRA for it: https://issues.apache.org/jira/browse/SPARK-3066

The easiest case is when one side is small. If both sides are large,
this is a super-expensive operation. We can do block-wise cross
product and then find top-k for each user.

Best,
Xiangrui

On Thu, Nov 6, 2014 at 4:51 PM, Debasish Das <debasish.das83@gmail.com> wrote:
> model.recommendProducts can only be called from the master then ? I have a
> set of 20% users on whom I am performing the test...the 20% users are in a
> RDD...if I have to collect them all to master node and then call
> model.recommendProducts, that's a issue...
>
> Any idea how to optimize this so that we can calculate MAP statistics on
> large samples of data ?
>
>
> On Thu, Nov 6, 2014 at 4:41 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>
>> ALS model contains RDDs. So you cannot put `model.recommendProducts`
>> inside a RDD closure `userProductsRDD.map`. -Xiangrui
>>
>> On Thu, Nov 6, 2014 at 4:39 PM, Debasish Das <debasish.das83@gmail.com>
>> wrote:
>> > I reproduced the problem in mllib tests ALSSuite.scala using the
>> > following
>> > functions:
>> >
>> >         val arrayPredict = userProductsRDD.map{case(user,product) =>
>> >
>> >          val recommendedProducts = model.recommendProducts(user,
>> > products)
>> >
>> >          val productScore = recommendedProducts.find{x=>x.product ==
>> > product}
>> >
>> >           require(productScore != None)
>> >
>> >           productScore.get
>> >
>> >         }.collect
>> >
>> >         arrayPredict.foreach { elem =>
>> >
>> >           if (allRatings.get(elem.user, elem.product) != elem.rating)
>> >
>> >           fail("Prediction APIs don't match")
>> >
>> >         }
>> >
>> > If the usage of model.recommendProducts is correct, the test fails with
>> > the
>> > same error I sent before...
>> >
>> > org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> > 0 in
>> > stage 316.0 failed 1 times, most recent failure: Lost task 0.0 in stage
>> > 316.0 (TID 79, localhost): scala.MatchError: null
>> >
>> > org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:825)
>> >
>> > org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:81)
>> >
>> > It is a blocker for me and I am debugging it. I will open up a JIRA if
>> > this
>> > is indeed a bug...
>> >
>> > Do I have to cache the models to make userFeatures.lookup(user).head to
>> > work
>> > ?
>> >
>> >
>> > On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>> >>
>> >> Was "user" presented in training? We can put a check there and return
>> >> NaN if the user is not included in the model. -Xiangrui
>> >>
>> >> On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das <debasish.das83@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I am testing MatrixFactorizationModel.predict(user: Int, product:
>> >> > Int)
>> >> > but
>> >> > the code fails on userFeatures.lookup(user).head
>> >> >
>> >> > In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has
>> >> > been
>> >> > called and in all the test-cases that API has been used...
>> >> >
>> >> > I can perhaps refactor my code to do the same but I was wondering
>> >> > whether
>> >> > people test the lookup(user) version of the code..
>> >> >
>> >> > Do I need to cache the model to make it work ? I think right now
>> >> > default
>> >> > is
>> >> > STORAGE_AND_DISK...
>> >> >
>> >> > Thanks.
>> >> > Deb
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message