mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arun abraham <arunabraham...@gmail.com>
Subject Re: Reg:-Integrating Mahout with Solr
Date Sat, 08 Apr 2017 21:39:54 GMT
Hi Pat,

Thanks for the reply.

I am not trying to create a personalize search but rather a recommendation
feature with in the application.

I am letting Solr to take care of the search and Mahout (APIs) to take care
of user recommendation.

To handle the user interaction with the documents, application has two
tables were documents interaction per user is recorded.

*Table 1*- for document rating
user_id,doc_id,doc_rating,time stamp

*Table 2*-for document view count
user_id,document_id,doc_view,time stamp.

The application has more documents than the no.of users-  So referred User
based recommendation.

I have been doing some hands-on with  the data(*Table 1*),converted the
data to *csv *.

PF the below code:


            DataModel dataModel=*new* FileDataModel(*new* File(
"data/ratings.csv"));

            UserSimilarity similarity=*new* EuclideanDistanceSimilarity(da
taModel);

            UserNeighborhood neighborhood=*new*
 NearestNUserNeighborhood(100,similarity, dataModel);

                    Recommender recommender=*new*
 GenericUserBasedRecommender(dataModel, neighborhood,similarity);

            List<RecommendedItem> recommendations= recommender
.recommend(1,5);



      *for*(RecommendedItem recommendedItem:recommendations){



            System.*out*.println(recommendedItem);

      }


I have also tried different similarity filters.

The above code provides fairly good results(done evaluations).But I have
been struggling to find a way to get user interaction* Table 2* data to be
a part of above data model(which now only have the *Table 1* document rating
data).

How can I modify code to filter through  more preferences?
How can I include *Table 2* data(once converted to csv) together with  the
rating file to have a single data model?

Eg:- The rating(csv file) has only rating has preference,How can I make the
API understand,more preferences(like the doc_view attribute of table-2 in
my case) to be considered while similarity filtering.

Thanks and Regards,
Arun

On 7 April 2017 at 04:54, Pat Ferrel <pat@occamsmachete.com> wrote:

> How is this different than content based search?
>
> What data other than words in content do you want to use?
>
> The document ratings? If this is all you are using then all you need is to
> boost results by the rating value or some simple thing like that. It might
> be better to get help from the Solr user group on implementing this.
>
> What I was describing uses individual user behavior to personalize search
> so different people might get slightly different results. But this requires
> more complex augmentation of the Solr documents and a more complex query.
>
>
> On Apr 5, 2017, at 11:37 PM, arun abraham <arunabraham100@gmail.com>
> wrote:
>
> Hi Pat,
>
> Thanks a lot for the detailed reply,it guided me to read more on to the
> recommendation features provided by mahout.
>
> I have been trying to find a recommendation  approach for my application.
> Kindly find the details of my approach panned for item recommendation.
> Kindly request you to correct me if I go wrong with the recommendation
> approach.
>
> I am trying to do an Item based search,which depends on the keyword
> provided by the user to search within Solr index.
> Solr returns the following for each document- id,file Name,content of the
> document(TIKA extracted),author.
>
> As a first step,I am thinking not to implement a complex recommendation,but
> rather a item based/user based one which I believe require less
> complexity.I have been doing some hands on with sample mahout APIs examples
> to generate item based recommendation with sample data set.
>
> My application(search tool) helps to find the appropriate documents from
> LAN(documents are indexed using Solr) upon search input,application returns
> documents with respective details.I would like to have the recommendations
> displayed for each documents.
>
> We have a rating feature where user can rate the document(0- 5),for this we
> have a table in MySQL with id,user_id,doc Name,rating,time-stamp.
> We also have a table where the document interaction details are
> stored-id,user_id,doc_id,no.of view,no.of search for each document.
>
> I would like to combine the rating and document interaction tables data to
> create a item recommendation which I believe would provide better
> recommendations.
> Can I accomplish the task without using Solr integrated,is it possible to
> use only mahout APIs and data from MySQL to create a efficient
> recommendation functionality?
> It would be helpful if I get your comments on the scope I mentioned above
> and also on the implementation part.
>
>
> Kindly guide me on the same.
>
>
> Thanks and Regards,
> Arun
>
> On Apr 3, 2017 12:40 AM, "Pat Ferrel" <pat@occamsmachete.com> wrote:
>
> > Ted’s cautions still apply regarding interactions per item and per user.
> > Do not ignore this advice.
> >
> > Also doing behavioral boosting in search is very different from
> item-based
> > recommendations. Behavioral boosting will give you only a small amount of
> > lift vs creating a recommender. Intuitively think of the fact that you
> may
> > have many items to recommend to the user but the added restriction of
> > containing the search terms means you will throw away most of the
> > recommendations you might make only to meet this requirement. Item-based
> > recs are the ones you show at the bottom of a product page or item being
> > read that are “similar” to the item the user is looking at in terms of
> > other interactions users make. Here there are no restrictions about what
> > terms these recommendations must contain. Therefor a recommender is
> better
> > than behavioral boosting as a general rule but since they can be used
> > together, it is a good this to implement as a second step if you have the
> > right kind of data.
> >
> > If you or anyone else reading this still needs behavioral search boosting
> > read on...
> >
> > As to integrating “behavioral boosting” with search, you will need to
> > create indicators by recording interactions. What are your conversion
> > events? Read, Buy? This will be your primary interaction, the one you
> want
> > to see happen more often. Then record secondary interactions, if you have
> > them. For an E-Commerce app the primary / conversion interaction is a
> > “buy”, one possible secondary would be a “product detail view” but there
> > are several other things you might record.
> >
> > Do you plan to write Scala code or use the Mahout CLI drivers? To use the
> > driver is not the ideal production tool but does work. You feed in a csv
> > for each interactions type recorded with the primary csv recording the
> > conversions interactions you want to favor. You will get our a series of
> > csvs that have data you can put into your Solr index since the key if
> each
> > row of the csv will be the item, the value will be a list of inicators
> you
> > should attache to the item in your index as a new field of type String
> > Array. So we are talking about the index you already have for items and
> > augmenting it with these behavioral indicators. If the indicator is “buy”
> > the you index will now have item “documents” with fields for your
> content,
> > maybe title, body, etc. Then you will add the behavioral indicators for
> > “buy”, “detail-view” etc. The use of the Mahout CLI drive for
> > “spark-itemsimilarity” is here: http://mahout.apache.org/
> > users/algorithms/intro-cooccurrence-spark.html <
> http://mahout.apache.org/
> > users/algorithms/intro-cooccurrence-spark.html>
> >
> > When you query, construct a query that must match some of the search
> > terms, but ask Solr to boost any items that also match the user’s history
> > if it can. This will cause items that the user is likely to favor to be
> > boosted in ranking. This also shows how search terms limit what can be
> done
> > to “recommend” items. Users expect that the words they use in search must
> > be somewhere in the content so we are limited to re-ranking term-based
> > search. This is not as strong as a recommender but should still be in
> your
> > bag of tricks as it is with the “big guys” like Amazon.
> >
> > Send me a private email if you are looking for hands-on help with this.
> >
> >
> > On Apr 1, 2017, at 6:21 PM, arun abraham <arunabraham100@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Thanks Pat for the reply.
> >
> > I am trying to implement item based recommendation as the first step.When
> > the user searches with a keyword(using Solr),not only it should return
> > keyword matching results(already implemented along with other search
> > features of Solr) but also related documents(recommended).
> >
> > I believe implementing item based recommendation will be a good learning
> > curve towards implementing the user based recommendation or Behavioral
> > based.As  a first step I am trying to recommend min of two documents(As
> my
> > Solr document index is ~100 docs).
> >
> > I understood that in the above scenario,first step is to provide the Solr
> > index to mahout to read and will generate a vector file from it.
> > It will be helpful if I get guidance on the integration steps to follow
> for
> > the same.
> >
> > Thanks and Regards,
> > Arun
> >
> >
> > On 1 April 2017 at 23:46, Pat Ferrel <pat@occamsmachete.com> wrote:
> >
> >> You want to create “Behavioral Search”? This is where you boost items
> > that
> >> have the search terms in them more likely to be favored by the
> individual
> >> user?
> >>
> >> You want to use the CCO algorithm in Mahout. You need to collect
> >> behavioral information like conversions, detailed page views, etc. Run
> > each
> >> event through CCO and you get a collection of “indicators” as item
> >> attributes. Augment the Solr index with fields (indicators) attached to
> >> item documents. Then at query time supply the search terms as a “must
> >> match” and use user history as the query segment against the
> > corresponding
> >> indicator fields as a “should match” with some boosting factor.
> >>
> >> CCO is here: http://mahout.apache.org/users/algorithms/intro-
> >> cooccurrence-spark.html <http://mahout.apache.org/
> > users/algorithms/intro-
> >> cooccurrence-spark.html>
> >> and a post on Personalizing Search here: http://www.actionml.com/blog/
> >> personalized_search <http://www.actionml.com/blog/personalized_search>
> >>
> >> BTW Do you have a recommender running? If not that is likely to generate
> >> almost an order of magnitude better results than Behavioral Search. From
> >> Industry wisdom and experience, implement a recommender first, then
> > augment
> >> search. On E-Commerce data we have reported results of 10-30% conversion
> >> lift from recommendations and ~3% for Behavioral Search. 3% is
> > significant
> >> but requires you to gather the same info that it takes to do a
> > recommender
> >> so why not do a recommender first.
> >>
> >> There is an almost turnkey recommender that uses CCO here:
> >> http://actionml.com/ur It uses Elasticsearch but is standalone, not
> >> integrated into any search tech you use elsewhere.
> >>
> >>
> >> On Mar 31, 2017, at 9:30 PM, arun abraham <arunabraham100@gmail.com>
> >> wrote:
> >>
> >> Hi All,
> >>
> >> I am trying to integrate Apache mahout with Solr.I have created a search
> >> application using Solr which has spellcheck,type ahead suggestions
> >> functionalities.I have a new requirement to display recommendations(
> from
> >> index which has ~100 docs ) for a specific search(keyword based).Is it
> >> possible to recommend docs or links from web together with the indexed
> >> data?
> >> Kindly guide me on the possibilities for the same also on the
> integration
> >> part.
> >>
> >> Thanks and Regards,
> >> Arun
> >>
> >>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message