mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iker Huerga <iker.hue...@gmail.com>
Subject Re: Setting up a recommender
Date Sun, 21 Jul 2013 15:10:10 GMT
Hi,

First of all, Ted, very inspiring video, I really enjoyed the concept of
cross-occurrences.

Secondly, I'd be very interested in collaborating on this project and here
is why. I've been recently working for my employer on a very similar
project that is currently deployed into our production environment.

We built a recommender system that takes instances from an ontology
identified in documents as part of an NLP process as an input, and
generates document recommendations as an output. We used a big training set
with positive and false positive matches to improve the accuracy of the
output. All these documents are indexed in Solr for which we built a
recommender RequestHandler that makes use of a RecommenderQParsePlugin we
also built for Solr.

With this we can provide recommendations to a user that is reading a
document, but in next iterations we are working towards providing
recommendations based on multiple kinds of inputs not only annotations.

This said, I would like to collaborate with you guys on the development
part of this project, just let me know how/where we can organize the user
stories and tasks.

I think a conference call, maybe a hangout, to kick off the project would
be useful, who should schedule it?

Thanks
Iker




2013/7/20 Ted Dunning <ted.dunning@gmail.com>

> To kick this off, I have created a design document that is open for
> comments.  Much detail is needed here.  I will create a JIRA as well, but
> the google doc is much easier for collating lots of input into a coherent
> document.
>
> The directory that the document is stored in is accessible at
>
> http://      bit.ly/18vbbaT <http://bit.ly/18vbbaT>
>
> Once we get going, we can talk about how to coordinate tasks between
> hangouts.  One option is a public Trello project: https://trello.com/ or
> we
> can use JIRA sub-tasks.
>
>
> On Sat, Jul 20, 2013 at 11:25 AM, Andrew Psaltis <
> Andrew.Psaltis@webtrends.com> wrote:
>
> > I am very interested in collaborating on the off-line to Solr part. Just
> > let me know how we want to get going.
> >
> > Thanks,
> > Andrew
> >
> >
> >
> >
> >
> > On 7/19/13 4:45 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >
> > >OK.  I think the crux here is the off-line to Solr part so let's see who
> > >else pops up.
> > >
> > >Having a solr maven could be very helpful.
> > >
> > >
> > >On Fri, Jul 19, 2013 at 3:39 PM, Luis Carlos Guerrero Covo <
> > >lcguerrerocovo@gmail.com> wrote:
> > >
> > >> I'm currently working for a portal that has a similar use case and I
> was
> > >> thinking of implementing this in a similar way. I'm generating
> > >> recommendations using python scripts based on similarity measures
> > >>(content
> > >> based recommendation) only using euclidean distance and some weights
> for
> > >> each attribute. I want to use mahout's GenericItemBasedRecommender to
> > >> generate these same recommendations without user data (no tracking
> right
> > >> now of user to item relationship). I was thinking of pushing the
> > >>generated
> > >> recommendations to solr using atomic updates since my fields are all
> > >>stored
> > >> right now. Since this is very similar to what I'm trying to
> accomplish,
> > >>I
> > >> would sign up to collaborate in any way I can since I'm fairly
> familiar
> > >> with solr and I'm starting to learn my way around mahout.
> > >>
> > >>
> > >> On Fri, Jul 19, 2013 at 5:12 PM, Sebastian Schelter <ssc@apache.org>
> > >> wrote:
> > >>
> > >> > I would also be willing to provide guidance and advice for anyone
> > >>taking
> > >> > this on, I can especially help with the offline analysis part.
> > >> >
> > >> > --sebastian
> > >> >
> > >> >
> > >> > 2013/7/19 Ted Dunning <ted.dunning@gmail.com>
> > >> >
> > >> > > I would be happy to supervise a project to implement a demo of
> this
> > >>if
> > >> > > anybody is willing to do the grunt work of gluing things together.
> > >> > >
> > >> > > Sooo, if you would like to work on this, here is a suggested
> > >>project.
> > >> > >
> > >> > > This project would entail:
> > >> > >
> > >> > > a) build a synthetic data source
> > >> > >
> > >> > > b) write scripts to do the off-line analysis
> > >> > >
> > >> > > c) write scripts to export to Solr
> > >> > >
> > >> > > d) write a very quick web facade over Solr to make it look like
a
> > >> > > recommendation engine.  This would include
> > >> > >
> > >> > >   d.1) a "most popular page" that does combined popularity rise
> and
> > >> > > recommendation
> > >> > >
> > >> > >   d.2) a "personal recommendation page" that does just
> > >>recommendation
> > >> > with
> > >> > > dithering
> > >> > >
> > >> > >   d.3) item pages with "related items" at the bottom
> > >> > >
> > >> > > e) work with others to provide high quality system walk-through
> and
> > >> > install
> > >> > > directions
> > >> > >
> > >> > > If you want to bite on this, we should arrange a weekly video
> > >>hangout.
> > >>  I
> > >> > > am willing to commit to guiding and providing detailed technical
> > >> > > approaches.  You should be willing to commit to actually doing
> > >>stuff.
> > >> > >
> > >> > > The goal would be to provide a fully worked out scaffolding of
a
> > >> > practical
> > >> > > recommendation system that presumably would become an example
> > >>module in
> > >> > > Mahout.
> > >> > >
> > >> > >
> > >> > > On Fri, Jul 19, 2013 at 1:08 PM, B Lyon <bradflyon@gmail.com>
> > wrote:
> > >> > >
> > >> > > > +1 as well.  Sounds fun.
> > >> > > >
> > >> > > > On Fri, Jul 19, 2013 at 4:06 PM, Dominik Hübner <
> > >> contact@dhuebner.com
> > >> > > > >wrote:
> > >> > > >
> > >> > > > > +1 for getting something like that in a future release
of
> Mahout
> > >> > > > >
> > >> > > > > On Jul 19, 2013, at 10:02 PM, Sebastian Schelter
> > >><ssc@apache.org>
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > It would be awesome if we could get a nice, easily
> deployable
> > >> > > > > > implementation of that approach into Mahout before
1.0
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > 2013/7/19 Ted Dunning <ted.dunning@gmail.com>
> > >> > > > > >
> > >> > > > > >> My current advice is to use Hadoop (if necessary)
to build
> a
> > >> > sparse
> > >> > > > > >> item-item matrix based on each kind of behavior
you have
> and
> > >> then
> > >> > > drop
> > >> > > > > >> those similarities into a search engine to
deliver the
> actual
> > >> > > > > >> recommendations.  This allows lots of flexibility
in terms
> of
> > >> > which
> > >> > > > > kinds
> > >> > > > > >> of inputs you use for the recommendation and
lets you blend
> > >> > > > > recommendations
> > >> > > > > >> with search and geo-location.
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >> On Fri, Jul 19, 2013 at 12:33 PM, Helder Martins
<
> > >> > > > > >> helder.garay@corp.terra.com.br> wrote:
> > >> > > > > >>
> > >> > > > > >>> Hi,
> > >> > > > > >>> I'm a dev working for a web portal in
Brazil and I'm
> > >> particularly
> > >> > > > > >>> interested in building a item-based collaborative
> filtering
> > >> > > > recommender
> > >> > > > > >>> for our database of news articles.
> > >> > > > > >>> After some coding, I was able to get some
recommendations
> > >> using a
> > >> > > > > >>> GenericItemBasedRecommender, a CassandraDataModel
and some
> > >> custom
> > >> > > > > >>> classes that store item similarities and
migrated item IDs
> > >>into
> > >> > > > > >>> Cassandra. But know I'm in doubt of what
is normally done
> > >>with
> > >> > this
> > >> > > > > >>> recommender: Should I run this as a daemon,
cache the
> > >> > > recommendations
> > >> > > > > >>> into memory and set up a web service to
consult it online?
> > >> > Should I
> > >> > > > pre
> > >> > > > > >>> process these recommendations for each
recent user and
> > >>store it
> > >> > > > > >>> somewhere? My first idea was storing all
these recs back
> > >>into
> > >> > > > > Cassandra,
> > >> > > > > >>> but looking into some classes it seems
to me that the norm
> > >>is
> > >> to
> > >> > > read
> > >> > > > > >>> the input data and store the output always
using files. Is
> > >> this a
> > >> > > > > common
> > >> > > > > >>> practice that benefits from HDFS?
> > >> > > > > >>> My use case here is something around 70k
recommendations
> > >> requests
> > >> > > per
> > >> > > > > >>> second.
> > >> > > > > >>>
> > >> > > > > >>> Thanks in advance,
> > >> > > > > >>>
> > >> > > > > >>> --
> > >> > > > > >>>
> > >> > > > > >>> Atenciosamente
> > >> > > > > >>> Helder Martins
> > >> > > > > >>> Arquitetura do Portal e Sistemas de Backend
> > >> > > > > >>> +55 (51) 3284-4475
> > >> > > > > >>> Terra
> > >> > > > > >>>
> > >> > > > > >>>
> > >> > > > > >>> Esta mensagem e seus anexos se dirigem
exclusivamente ao
> seu
> > >> > > > > >> destinatário,
> > >> > > > > >>> podem conter informação privilegiada
ou confidencial e são
> > >>de
> > >> uso
> > >> > > > > >> exclusivo
> > >> > > > > >>> da pessoa ou entidade de destino. Se não
for destinatário
> > >>desta
> > >> > > > > mensagem,
> > >> > > > > >>> fica notificado de que a leitura, utilização,
divulgação
> > >>e/ou
> > >> > cópia
> > >> > > > sem
> > >> > > > > >>> autorização pode estar proibida em virtude
da legislação
> > >> vigente.
> > >> > > Se
> > >> > > > > >>> recebeu esta mensagem por engano, pedimos
que nos o
> > >>comunique
> > >> > > > > >> imediatamente
> > >> > > > > >>> por esta mesma via e, em seguida, apague-a.
> > >> > > > > >>>
> > >> > > > > >>> Este mensaje y sus adjuntos se dirigen
exclusivamente a su
> > >> > > > > destinatario,
> > >> > > > > >>> puede contener información privilegiada
o confidencial y
> es
> > >> para
> > >> > > uso
> > >> > > > > >>> exclusivo de la persona o entidad de destino.
Si no es
> > >>usted él
> > >> > > > > >>> destinatario indicado, queda notificado
de que la lectura,
> > >> > > > utilización,
> > >> > > > > >>> divulgación y/o copia sin autorización
puede estar
> > >>prohibida en
> > >> > > > virtud
> > >> > > > > de
> > >> > > > > >>> la legislación vigente. Si ha recibido
este mensaje por
> > >>error,
> > >> le
> > >> > > > > pedimos
> > >> > > > > >>> que nos lo comunique inmediatamente por
esta misma vía y
> > >> proceda
> > >> > a
> > >> > > su
> > >> > > > > >>> exclusión.
> > >> > > > > >>>
> > >> > > > > >>> The information contained in this transmissión
is
> privileged
> > >> and
> > >> > > > > >>> confidential information intended only
for the use of the
> > >> > > individual
> > >> > > > or
> > >> > > > > >>> entity named above. If the reader of this
message is not
> the
> > >> > > intended
> > >> > > > > >>> recipient, you are hereby notified that
any dissemination,
> > >> > > > distribution
> > >> > > > > >> or
> > >> > > > > >>> copying of this communication is strictly
prohibited. If
> you
> > >> have
> > >> > > > > >> received
> > >> > > > > >>> this transmission in error, do not read
it. Please
> > >>immediately
> > >> > > reply
> > >> > > > to
> > >> > > > > >> the
> > >> > > > > >>> sender that you have received this communication
in error
> > >>and
> > >> > then
> > >> > > > > delete
> > >> > > > > >>> it.
> > >> > > > > >>>
> > >> > > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > BF Lyon
> > >> > > > http://www.nowherenearithaca.com
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Luis Carlos Guerrero Covo
> > >> M.S. Computer Engineering
> > >> (57) 3183542047
> > >>
> >
> >
>



-- 
Iker Huerga
http://www.ikerhuerga.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message