mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sékine Coulibaly <scoulib...@gmail.com>
Subject Re: How to start the factorize-movielens.sh from a PHP script ?
Date Wed, 23 Jan 2013 17:59:34 GMT
Hi Sebastian,
The usecase is as follows : I have a catalog containing around 100.000
items.
I want to setup a webapplication providing recommendations to users.

When a user subscribes, I ask him to rate a random subset of the catalog,
for example 20 randoms items. I store its userId, itemId and score in a
file (or db).
Once he has finished rating the 20 items, I was planning to to suggest him,
a list of items recommendations. I wish I could the recommendations could
be done with a minimum delay (avoiding waiting too long after its initial
subscription to display a recommendation).

I'm looking for the simplest solution (no application server, no weblayers
such as kornapi/myrrix, and if possible no database). Writing Java is not a
real problem, but I'm really willing to have the dummiest thing possible,
no dependencies.

 I've not been able to use the examples to :
- be able to issue recommendation for a very specific userId
- get a efficient way to return a recommendation with a latency < 6 minutes
on a laptop VM.

Hope this is clearer now, thanks again for your time.

Sekine


2013/1/17 Sebastian Schelter <ssc@apache.org>

> Hi Sekine,
>
> I'm not sure I understand your problem correctly, What exactly is your
> usecase, how many users and items do you have?
>
> The mahout commandline tools only offer Hadoop-based recommenders that
> are designed to recommend in batch for millions of users and will
> usually take minutes to hours to run.
>
> Mahout also offers a Java framework that allows flexible, online
> recommendation for single users. For people who don't want to dive into
> the framework, there are simple, easy to use weblayers available like
> kornakapi [1] or myrrix [2].
>
> Did you look at those? You don't need to write a single line of Java
> code for using them and both offer a very convenient way to use an
> ALS-based recommender. Also they are shipped with an easy to use
> webservice that should be callable from PHP with minimal effort.
> Furthermore they should respond to requests concerning single user
> recommendations in a few milliseconds.
>
>
> Best,
> Sebastian
>
> [1] https://github.com/plista/kornakapi
> [2] http://myrrix.com/
>
>
>
> On 17.01.2013 17:55, Sékine Coulibaly wrote:
> > Sebastian,
> >
> > This sounds reasonable. However, I observe that running the
> > factorize-movielens script computes recommendations for *all* users. Is
> > there a way to compute the recommandation for only one user ?
> >
> > The recommenditembased recommender allows for using an external file
> > containing the user id, however that algorithm is so slow compared to the
> > factorize (6minutes to run, compared to 6 minutes but for thousands of
> > recommendations). But I didn't find such an option in the factorize
> script
> > (besides, it seems that some of the ALS are precomputed and cached, so
> that
> > the recommendation job is quicker).
> >
> > Thank you !
> >
> >
> > 2013/1/17 Sékine Coulibaly <scoulibaly@gmail.com>
> >
> >> Sebastian,
> >>
> >> This sounds reasonable. However, I observe that running the
> >> factorize-movielens script computes recommendations for *all* users. Is
> >> there a way to compute the recommandation for only one user ?
> >>
> >> The recommenditembased recommender allows for using an external file
> >> containing the user id, however that algorithm is so slow compared to
> the
> >> factorize (6minutes to run, compared to 6 minutes but for thousands of
> >> recommendations).
> >>
> >> Thank you !
> >>
> >>
> >>
> >> 2013/1/14 Sebastian Schelter <ssc@apache.org>
> >>
> >>> Then I would suggesz that you modify the shell script to periodically
> >>> precompute the recommendations and put them into a database afterwards
> >>> which you can query via PHP.
> >>>
> >>> It makes no sense IMO to call a webservice that starts a Hadoop job and
> >>> wait for the results.
> >>>
> >>> /s
> >>>
> >>>
> >>>
> >>>
> >>> On 14.01.2013 10:12, Sékine Coulibaly wrote:
> >>>> Ibrahim, Sebastian,
> >>>>
> >>>> I precisely am trying to create a PHP Webservice to deliver
> >>> recommendations.
> >>>>
> >>>> On a webpage, I would call that webservice, and I was imagining having
> >>> that
> >>>> webservice calling the factorize-movielens script itself, and
> >>> transforming
> >>>> the latter output to something like
> >>>>
> >>>
> [{itemID:557,value:5.988698},{itemID:578,value:5.0461025},{itemID:1149,value:4.9268165},{itemID:572,value:4.9265957},{itemID:3245,value:4.8139095}],
> >>>> a JSON I could easily parse in my front-end.
> >>>>
> >>>> I don't want (if possible) to involve any Java application or http
> >>> server
> >>>> as suggested (kornapi,myrrix), although I understand these would be
a
> >>>> sensible way to do (I'm interested in learning MAhout, so obfuscating
> >>> that
> >>>> part is something I'd like to avoid).
> >>>>
> >>>> Regards
> >>>>
> >>>>
> >>>> 2013/1/14 Sebastian Schelter <ssc@apache.org>
> >>>>
> >>>>> This blog post might be useful for you:
> >>>>>
> >>>>> http://ssc.io/a-recommendation-webservice-in-10-minutes/
> >>>>>
> >>>>> On 14.01.2013 09:31, Sékine Coulibaly wrote:
> >>>>>> Hi Ibrahim,
> >>>>>>
> >>>>>> Actually, for now, I wish I could use it locally, in other words
> >>> without
> >>>>>> using Haddop framework. I've been successfull in trying to launch
:
> >>>>>> factorize-movielense-1M.sh ratings.dat
> >>>>>>
> >>>>>> I wish I could launch that very same command from PHP. The Apache
> >>> user is
> >>>>>> www-data indeed. The /tmp/mahout-work-www-data directory is
created
> >>> but
> >>>>>> only contains the ratings.csv file.
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2013/1/14 Ibrahim Yakti <iyakti@souq.com>
> >>>>>>
> >>>>>>> your php scripts run using apache user which most probably
doesn't
> >>> have
> >>>>>>> HADOOP_HOME, HADOOP_CONF_DIR, ...etc  variables defined,
please try
> >>> to
> >>>>>>> define them in the php script before making the call.
> >>>>>>>
> >>>>>>> I hope it works.
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Ibrahim
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Jan 13, 2013 at 11:38 PM, Sékine Coulibaly <
> >>>>> scoulibaly@gmail.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi there,
> >>>>>>>>
> >>>>>>>> I've been able to start locally the factorize-movielens
script.
> What
> >>>>> I'd
> >>>>>>>> like to do is basically create a PHP webservice able
to start that
> >>> very
> >>>>>>>> same script, and return the recommendations.
> >>>>>>>>
> >>>>>>>> I'm using Apache2, and I use PHP's shell_exec to start
the script
> as
> >>>>>>>> follows :
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> putenv("JAVA_HOME=" .'/usr/local/jvm/jdk1.7.0_05');
> >>>>>>>>  $output =
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> shell_exec('/home/scoulibaly/Téléchargements/mahout-distribution-0.6/examples/bin/factorize-movielens-1M.sh
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> /home/scoulibaly/Téléchargements/mahout-distribution-0.6/examples/bin/ratings.dat');
> >>>>>>>>  echo $output;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Unfortunately the output I get is as follows :
> >>>>>>>>
> >>>>>>>> creating work directory at /tmp/mahout-work-www-data
> >>>>>>>>
> >>>>>>>> Converting ratings...
> >>>>>>>>
> >>>>>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >>>>>>>> no HADOOP_HOME set, running locally
> >>>>>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >>>>>>>> no HADOOP_HOME set, running locally
> >>>>>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >>>>>>>> no HADOOP_HOME set, running locally
> >>>>>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >>>>>>>> no HADOOP_HOME set, running locally
> >>>>>>>>
> >>>>>>>> RMSE is:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Sample recommendations:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> removing work directory
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I know this is not strictly a Mahout issue, but if someone
could
> >>> point
> >>>>>>> me a
> >>>>>>>> way to start Mahout jobs from a PHP script, I'd be very
grateful !
> >>>>>>>>
> >>>>>>>> Thank you
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message