predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Customers clustering
Date Fri, 07 Jul 2017 19:41:01 GMT
You can weight each term in the index and query. But be careful of norms, and TFIDF, which
may minimize frequency differences. In many cases the number of occurrences may be misleading.
For instance category 10 may be small consumable items and category 

On Jul 7, 2017, at 9:31 AM, Luciano Vandi <> wrote:

Ok got it. But this way, if a customer bought 8 items from category10, and 1 from category1
and category2 it would rank high for cluster_1, even if it's more interested in category10.
Am I wrong?

2017-07-07 17:48 GMT+02:00 Pat Ferrel < <>>:
You'll have to work out the ES query JSON, use arrays of strings un-analysed.

ES docs indexed
  cluster_1: [“category 1”, “category 2”]
  cluster_2: [“category 5”, “category 10”, …]

  user_purchase_history: [“category 1”, “category 2”]

So he query would be:  [“category 1”, “category 2”] and it would return the clusters
with cluster_1 ranked highest.

as you can see the terms in the user history can be used as a query to return the cluster-id
that is most similar. This is called K-Nearest Neighbors (KNN) and is done using cosine similarity.
ES (and Solr, both based n Lucene) are great KNN engines for sparse data.

On Jul 7, 2017, at 4:30 AM, Luciano Vandi < <>>

Thanks Pat, you're right. This is what I'm trying to do.

It's not clear to me how to query ElasticSearch with user’s history of bought item categories.
Can you make an example? 

2017-07-06 23:13 GMT+02:00 Pat Ferrel < <>>:
Actually it sounds like you already have clusters that are made up of categories and you want
to know which cluster definition is most similar to what the user has bought? If so you don’t
need clustering but similarity. This is pretty easy to do by putting each cluster into Elasticsearch
as a doc with a list of categories—so 6 or so docs, then use the user’s history of bought
item categories as the query, you’ll get all clusters ranked from most similar (to the user’s
history) to least.

You would have to store user history on your own

This could be put into a simple template but if you already have user history, it may be overkill.

On Jul 6, 2017, at 1:39 PM, Pat Ferrel < <>>

There are 2 clustering templates but it looks like they both need to be moved from
<> to Apache PIO, which should be easy. See the template gallery
here: <>

On Jul 6, 2017, at 12:35 PM, Luciano Vandi < <>>

Hi there, i'm new to the mailing-list. Thanks to the guys at <>,
ActionML and to anyone from the community!

I have a question regarding a project I'm working on. From a database of customers/orders
I would like to export buy/view events in order to assign each customer to one or more of
6 predefined cluster. Each cluster reflect the macro-category associated to the bought/viewed

Then I would like to query a service to get all customers within a cluster, or all cluster
where a customer belongs. 

Is there any pio-template I should start to explore, or do I need to ask a consultancy to
ActionML team?

Have a nice day!



  Soluzioni PaaS e SaaS per il Commercio Elettronico
  Email: <>
  Mobile: (+39) 340 90 21 354


  Soluzioni PaaS e SaaS per il Commercio Elettronico
  Email: <>
  Mobile: (+39) 340 90 21 354

View raw message