predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Items blacklisted in the query made to Elasticsearch by UR
Date Wed, 03 May 2017 16:05:34 GMT
Sorry, after all this I’m a bit unclear as to what the problem is. Is this the case:

1) You are trying to NOT filter out items the user has seen?
2) You have set “blacklistEvent”: [], in engine.json?
3) You are making user based queries with a user-id only?

I’ll try this with a small dataset used for integration tests and if it fails, will add
it to the test and fix. Please verify that I have described the problem correctly.


On May 3, 2017, at 12:48 AM, brunolebon@gmail.com wrote:

Hi,

Back to the problem we had we opened a JIRA as you suggested and in the meantime we decided
to modify the code to fit our need.

The actual version has this piece of code: (from: https://github.com/actionml/universal-recommender/blob/master/src/main/scala/URAlgorithm.scala)
    val blacklistedItems = userEvents.filter { event =>
      // either a list or an empty list of filtering events so honor them
      blacklistEvents match {
        case Nil => modelEventNames.head equals event.event
        case _   => blacklistEvents contains event.event
      }
    }.map(_.targetEntityId.getOrElse("")) ++ query.blacklistItems.getOrEmpty.distinct

So only two options: empty list or a list of events. In both cases something is blacklisted
(either the primary event or the events listed).

In our case we dont want anything to be blacklisted, so we modified the code as follow:
    val blacklistedItems = userEvents.filter { event =>
      // either a list or an empty list of filtering events so honor them
      blacklistEvents match {
        case Nil => false
        case _   => blacklistEvents contains event.event
      }
    }.map(_.targetEntityId.getOrElse("")) ++ query.blacklistItems.getOrEmpty.distinct

Now it works just fine.

FYI We found an old piece of code here: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation/blob/master/src/main/scala/URAlgorithm.scala,
which says:
val blacklistedItems = userEvents.filter { event =>
      if (ap.blacklistEvents.nonEmpty) {
        // either a list or an empty list of filtering events so honor them
        if (ap.blacklistEvents.get == List.empty[String]) false // no filtering events so
all are allowed
        else ap.blacklistEvents.get.contains(event.event) // if its filtered remove it, else
allow
      } else ap.eventNames(0).equals(event.event) // remove the primary event if nothing specified
    }.map (_.targetEntityId.getOrElse("")) ++ query.blacklistItems.getOrElse(List.empty[String])
    .distinct
This one has three possibilities, either it filters the primary event, or the events specified
in the list, or it filters nothing, so it seems it was possible to not blacklist  any event
at some point in Pio and that it is no longer the case. We were wondering why.





Le jeudi 6 avril 2017 09:31:13 UTC+2, Bruno LEBON a écrit :
BTW I assume "user": "069bbbbd-8661-453f-8c89-ac50aea0c0d8” has those items in their “facet”
history? Otherwise I’m not sure where they’d come from.

Yes I confirm that this user has those items in his facet history.

2017-04-05 18:18 GMT+02:00 Pat Ferrel <p...@occamsmachete.com <javascript:>>:
Ok thanks for ruling out a couple things, I’ll take a look at this.

BTW I assume "user": "069bbbbd-8661-453f-8c89-ac50aea0c0d8” has those items in their “facet”
history? Otherwise I’m not sure where they’d come from.


On Apr 5, 2017, at 2:31 AM, Bruno LEBON <b.l...@redfakir.fr <javascript:>> wrote:

Yes, we have pio 0.10.0, the UR v 0.5.0 and the blacklistEvents is disabled. 

We sent this kind of query to Pio 
{ "user": "069bbbbd-8661-453f-8c89-ac50aea0c0d8", "num": 11 }

The JSON generated for Elasticsearch is:

{"size":11,"query":{"bool":{"should":[{"terms":{"facet":["estag_begin-couleur-noir-estag_end","cocooning","sexy","charme","estag_begin-taille-105h-estag_end","estag_begin-taille-4-estag_end","estag_begin-primadonna-estag_end","transparent","estag_begin-aubade-estag_end","estag_begin-couleur-rouge-estag_end","une-piece","estag_begin-simone-perele-estag_end","maintien","moins-de-20-euros-intervalle-de-prix","estag_begin-taille-taille-unique-estag_end","estag_begin-moins-50-pour-cent-estag_end","elasthanne","blouse","body","coque","string","slip","estag_begin-taille-95a-estag_end"]}},{"terms":{"view":[]}},{"constant_score":{"filter":{"match_all":{}},"boost":0}}],"must":[],"must_not":{"ids":{"values":["estag_begin-taille-95a-estag_end","string","estag_begin-aubade-estag_end","slip","elasthanne","coque","body","blouse","estag_begin-moins-50-pour-cent-estag_end","estag_begin-primadonna-estag_end","estag_begin-taille-taille-unique-estag_end","moins-de-20-euros-intervalle-de-prix","maintien","estag_begin-simone-perele-estag_end","une-piece","estag_begin-couleur-rouge-estag_end","transparent","sexy","estag_begin-taille-4-estag_end","estag_begin-taille-105h-estag_end","charme","cocooning","estag_begin-couleur-noir-estag_end"],"boost":0}},"minimum_should_match":1}},"sort":[{"_score":{"order":"desc"}},{"popRank":{"unmapped_type":"double","order":"desc"}}]}

We also set "returnSelf": true as we want every item to be recommended to the user.



2017-04-04 17:30 GMT+02:00 Pat Ferrel <p...@occamsmachete.com <javascript:>>:
Ok, so you are using pio 0.10.0, the UR v 0.5.0 and have disabled the blacklistEvents as shown
below?

Then when you query for a user you are not getting all items returned? 

Can you share an example of the query you send to pio and the JSON that is created for Elasticsearch?


On Apr 4, 2017, at 7:13 AM, Bruno LEBON <b.l...@redfakir.fr <javascript:>> wrote:

Hi,

Sorry my bad, I searched for the piece of code on the internet and your repository came first.
I had the right repo in prod (https://github.com/actionml/universal-recommender <https://github.com/actionml/universal-recommender>),
I doubled check. Sorry for the misunderstanding.

We still have the same problem. 

Here the engine.json we use:
{
  "comment":"",
  "id": "default",
  "description": "settings",
  "engineFactory": "org.template.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "sample-handmade-data.txt",
      "appName": "piourcluster",
      "eventNames": ["facet","view"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io <http://sparkbindings.io/>.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "es.nodes":"espionode1:9200,espionode2:9200,espionode3:9200"
  },
"algorithms": [
    {
      "name": "ur",
      "params": {
        "appName": "piourcluster",
        "indexName": "urindex",
        "typeName": "items",
        "eventNames": ["facet", "view"],
        "blacklistEvents": [],
        "maxEventsPerEventType": 50000,
        "maxCorrelatorsPerEventType": 50,
        "maxQueryEvents": 100,
        "num": 11,
        "rankings": [
          {
            "name": "popRank",
            "type": "popular"
          }
        ],
        "returnSelf": true
      }
    }
  ]
}

We don't have a blacklist in our query, the query is basic, we use the Java API giving it
the user id and the number of recommendation we want back.


2017-03-31 21:55 GMT+02:00 Pat Ferrel <p...@occamsmachete.com <javascript:>>:
you should not be using code from that repo. See the pio template gallery, it points to the
correct template. My personal version is for experimental branches.

The repo is here: https://github.com/actionml/universal-recommender <https://github.com/actionml/universal-recommender>

The function is here: https://github.com/actionml/universal-recommender/blob/master/src/main/scala/URAlgorithm.scala#L634
<https://github.com/actionml/universal-recommender/blob/master/src/main/scala/URAlgorithm.scala#L634>
and looks like it is doing the right thing.

Try it with UR v0.5.0 from the correct repo and if it doesn’t work, I’ll take a look.
Please send along the engine.json you used. just to be sure we are on the same page. BTW are
you using a blacklist in your query also? Please give an example query.


On Mar 31, 2017, at 6:45 AM, Bruno LEBON <b.l...@redfakir.fr <javascript:>> wrote:

Hi,

Thanks for your answer. We tried that already but it doesnt change anything, we still have
blacklisted items (primary events mainly or only from what I see).

I think the piece of code in charge of blacklisting is this one: (from here https://github.com/pferrel/template-scala-parallel-universal-recommendation/blob/master/src/main/scala/URAlgorithm.scala
<https://github.com/pferrel/template-scala-parallel-universal-recommendation/blob/master/src/main/scala/URAlgorithm.scala>)


  /** Create a list of item ids that the user has interacted with or are not to be included
in recommendations */
  def getExcludedItems(userEvents: Seq[Event], query: Query): Seq[String] = {

    val blacklistedItems = userEvents.filter { event =>
      // either a list or an empty list of filtering events so honor them
      blacklistEvents match {
        case Nil => modelEventNames.head equals event.event
        case _   => blacklistEvents contains event.event
      }
    }.map(_.targetEntityId.getOrElse("")) ++ query.blacklistItems.getOrEmpty.distinct

    // Now conditionally add the query item itself
    val includeSelf = query.returnSelf.getOrElse(returnSelf)
    val allExcludedItems = if (!includeSelf && query.item.nonEmpty) {
      blacklistedItems :+ query.item.get
    } // add the query item to be excuded
    else {
      blacklistedItems
    }
    allExcludedItems.distinct
  }

But my knowledge of Scala is very limited, so I dont understand the details. Does it say that
if the parameter blacklistEvents is empty, aka = [], then no events are to be excluded (plus/minus
the includeSelf option).

Do I have the right version of UR? (https://github.com/pferrel/template-scala-parallel-universal-recommendation
<https://github.com/pferrel/template-scala-parallel-universal-recommendation>)

2017-03-30 20:00 GMT+02:00 Pat Ferrel <p...@occamsmachete.com <javascript:>>:
"blacklistEvents": [[]], should be "blacklistEvents": [],


On Mar 30, 2017, at 8:57 AM, Bruno LEBON <b.l...@redfakir.fr <javascript:>> wrote:

Hello,

We test the universal recommender on a cluster made following the tutorial from actionML.
Once the build/train/deploy is done we send PIO a request to get recommendation.
For example:
curl -H "Content-Type: application/json" -d '{ "user": "4e810ef4-977a-4f04-b585-cf2c2996ec93",
"num": 11 }' http://localhost:8001/queries.json <http://localhost:8001/queries.json>

In the pio.log we see the requests made to Elasticsearch. They look like:
{"size":11,"query":{"bool":{"should":[{"terms":{"facet":["estag_begin-couleur-noir-estag_end","cocooning","sexy","charme","estag_begin-taille-105h-estag_end","estag_begin-taille-4-estag_end","estag_begin-primadonna-estag_end","transparent","estag_begin-aubade-estag_end","estag_begin-couleur-rouge-estag_end","une-piece","estag_begin-simone-perele-estag_end","maintien","moins-de-20-euros-intervalle-de-prix","estag_begin-taille-taille-unique-estag_end","estag_begin-moins-50-pour-cent-estag_end","elasthanne","blouse","body","coque","string","slip","estag_begin-taille-95a-estag_end"]}},{"terms":{"view":[]}},{"constant_score":{"filter":{"match_all":{}},"boost":0}}],"must":[],"must_not":{"ids":{"values":["estag_begin-taille-95a-estag_end","string","estag_begin-aubade-estag_end","slip","elasthanne","coque","body","blouse","estag_begin-moins-50-pour-cent-estag_end","estag_begin-primadonna-estag_end","estag_begin-taille-taille-unique-estag_end","moins-de-20-euros-intervalle-de-prix","maintien","estag_begin-simone-perele-estag_end","une-piece","estag_begin-couleur-rouge-estag_end","transparent","sexy","estag_begin-taille-4-estag_end","estag_begin-taille-105h-estag_end","charme","cocooning","estag_begin-couleur-noir-estag_end"],"boost":0}},"minimum_should_match":1}},"sort":[{"_score":{"order":"desc"}},{"popRank":{"unmapped_type":"double","order":"desc"}}]}

The important part is the fact that there is a must_not that is not empty. We want it to be
empty, we have the following engine.json:
{
  "comment":"",
  "id": "default",
  "description": "settings",
  "engineFactory": "org.template.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "sample-handmade-data.txt",
      "appName": "piourcluster",
      "eventNames": ["facet","view"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io <http://sparkbindings.io/>.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "es.nodes":"espionode1:9200,espionode2:9200,espionode3:9200"
  },
"algorithms": [
    {
      "name": "ur",
      "params": {
        "appName": "piourcluster",
        "indexName": "urindex",
        "typeName": "items",
        "eventNames": ["facet", "view"],
        "blacklistEvents": [[]],
        "maxEventsPerEventType": 50000,
        "maxCorrelatorsPerEventType": 50,
        "maxQueryEvents": 100,
        "num": 11,
        "rankings": [
          {
            "name": "popRank",
            "type": "popular"
          }
        ],
        "returnSelf": true
      }
    }
  ]
}

From what we understand the fact that we have an array containing an empty array for the parameter
blacklistEvents tells UR that we don't want any event to be blacklisted, not even the primary
one. 
We also added the parameter returnSelf : true to ask UR not to blacklist any items part of
the query.

So why do we have blacklisted events in our query (ie the must_not part of it) ? 

(Note that when we do a change in the engine.json and launch a deploy, we see in the log some
parameters value appearing, thus we know we modify the right engine.json file.)

Regards
Bruno











-- 
You received this message because you are subscribed to the Google Groups "actionml-user"
group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com
<javascript:>.
To post to this group, send email to action...@googlegroups.com <javascript:>.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAMeWnoQoUHEwXyNSetVbO14B9Vjia-PgmOZuxp_GPLRisobe6w%40mail.gmail.com
<https://groups.google.com/d/msgid/actionml-user/CAMeWnoQoUHEwXyNSetVbO14B9Vjia-PgmOZuxp_GPLRisobe6w%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.




Mime
View raw message