Mailing-List: contact user-help@predictionio.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@predictionio.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: <E4D0E1B8-D954-4004-B584-C950C4644651@occamsmachete.com>
References: <CAGQyoRcJ_8zsfPUw=StSQEhXp0-_kfGVEuN81sumrmvWNykxsA@mail.gmail.com>
 <E4D0E1B8-D954-4004-B584-C950C4644651@occamsmachete.com>
From: Martin Fernandez <martingfernandez@gmail.com>
Date: Thu, 1 Jun 2017 15:19:35 -0300
Message-ID: <CAGQyoRcZtFs9BdGx4y9qSd0qyJBxPQN7NuGrgStT+No+7nn0Mw@mail.gmail.com>
Subject: Re: Disable hbase user history queries
To: user@predictionio.incubator.apache.org
Cc: actionml-user <actionml-user@googlegroups.com>
Content-Type: multipart/alternative; boundary="001a1149e68a3f71530550ea14d6"
archived-at: Thu, 01 Jun 2017 18:19:42 -0000

--001a1149e68a3f71530550ea14d6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks Pat for your reply. I am doing Video on Demand e-commerce in which
reatime query would be very helpful but I want to minimize the risks of
HDFS synchronization latency between datacenters. Do you have experience
running predictionIO + Universal Recommender in multiple DCs that you can
share? Did you face any latency issue with the HBASE cluster?

Thanks in advance

On Thu, Jun 1, 2017 at 2:53 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

> First, I=E2=80=99m not sure this is a good idea. You loose the realtime n=
ature of
> recommendations based on the up-to-the-second recording of user behavior.
> You get this with live user event input even without re-calculating the
> model in realtime.
>
> Second, no you can=E2=80=99t disable queries for user history, it is the =
single
> most important key to personalized recommendations.
>
> I=E2=80=99d have to know more about your application but the first line o=
f cost
> cutting for us in custom installations (I work for ActionML the maintaine=
r
> of the UR Template) is to make the Spark cluster temporary since it is no=
t
> needed to serve queries and only needs to run during training. We start i=
t
> up, train. then shut it down.
>
> If you really want to shut the entire system down and don=E2=80=99t want =
realtime
> user behavior you can query for all users and put the results in your DB =
or
> in-memory cache like a hashmap, then just serve from your db or in-memory
> cache. This takes you back to the days of the old Mahout Mapreduce
> recommenders (pre 2014) but maybe it fits your app.
>
> If you are doing E-Commerce think about a user=E2=80=99s shopping behavio=
r. They
> shop, browse, then buy. Once they buy that old shopping behavior is no
> longer indicative of realtime intent. If you miss using that behavior you
> may miss the shopping session altogether. But again, your needs may vary.
>
>
> On Jun 1, 2017, at 6:19 AM, Martin Fernandez <martingfernandez@gmail.com>
> wrote:
>
> Hello guys,
>
> we are trying to deploy Universal Recommender + predictionIO in our
> infrastructure but we don't want to distribute hbase accross datacenters
> cause of the latency. So the idea is to build and train the engine offlin=
e
> and then copy the model and ealstic data to PIO replicas. I noticed when =
I
> deploy engine, it always tries to connect to HBASE server since it is use=
d
> to query user history. Is there any way to disable those user history
> queries and avoid connection to HBASE?
>
> Thanks
>
> Martin
>
>


--=20
Saludos / Best Regards,

*Martin Gustavo Fernandez*
Mobile: +5491132837292

--001a1149e68a3f71530550ea14d6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks Pat for your reply. I am doing Video on Demand e-co=
mmerce in which reatime query would be very helpful but I want to minimize =
the risks of HDFS synchronization latency between datacenters. Do you have =
experience running predictionIO + Universal Recommender in multiple DCs tha=
t you can share? Did you face any latency issue with the HBASE cluster? =C2=
=A0<div><br></div><div>Thanks in advance</div></div><div class=3D"gmail_ext=
ra"><br><div class=3D"gmail_quote">On Thu, Jun 1, 2017 at 2:53 PM, Pat Ferr=
el <span dir=3D"ltr">&lt;<a href=3D"mailto:pat@occamsmachete.com" target=3D=
"_blank">pat@occamsmachete.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">First, I=E2=80=99m not sure this is a good idea. You loose th=
e realtime nature of recommendations based on the up-to-the-second recordin=
g of user behavior. You get this with live user event input even without re=
-calculating the model in realtime.<br>
<br>
Second, no you can=E2=80=99t disable queries for user history, it is the si=
ngle most important key to personalized recommendations.<br>
<br>
I=E2=80=99d have to know more about your application but the first line of =
cost cutting for us in custom installations (I work for ActionML the mainta=
iner of the UR Template) is to make the Spark cluster temporary since it is=
 not needed to serve queries and only needs to run during training. We star=
t it up, train. then shut it down.<br>
<br>
If you really want to shut the entire system down and don=E2=80=99t want re=
altime user behavior you can query for all users and put the results in you=
r DB or in-memory cache like a hashmap, then just serve from your db or in-=
memory cache. This takes you back to the days of the old Mahout Mapreduce r=
ecommenders (pre 2014) but maybe it fits your app.<br>
<br>
If you are doing E-Commerce think about a user=E2=80=99s shopping behavior.=
 They shop, browse, then buy. Once they buy that old shopping behavior is n=
o longer indicative of realtime intent. If you miss using that behavior you=
 may miss the shopping session altogether. But again, your needs may vary.<=
br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
On Jun 1, 2017, at 6:19 AM, Martin Fernandez &lt;<a href=3D"mailto:martingf=
ernandez@gmail.com">martingfernandez@gmail.com</a>&gt; wrote:<br>
<br>
Hello guys,<br>
<br>
we are trying to deploy Universal Recommender + predictionIO in our infrast=
ructure but we don&#39;t want to distribute hbase accross datacenters cause=
 of the latency. So the idea is to build and train the engine offline and t=
hen copy the model and ealstic data to PIO replicas. I noticed when I deplo=
y engine, it always tries to connect to HBASE server since it is used to qu=
ery user history. Is there any way to disable those user history queries an=
d avoid connection to HBASE?<br>
<br>
Thanks<br>
<br>
Martin<br>
<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">=
<div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div=
><div>Saludos / Best Regards,<br><br></div><font size=3D"4"><b>Martin Gusta=
vo Fernandez</b></font></div></div>Mobile:=C2=A0<span style=3D"font-size:sm=
all">+5491132837292</span><br><br></div></div></div></div></div></div></div=
></div></div></div></div></div></div></div>
</div>

--001a1149e68a3f71530550ea14d6--