predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Batch recommender
Date Fri, 28 Jul 2017 17:30:21 GMT
As it happens I just looked into the concurrency issue. How many connections can be made to
the Prediction Server. The answer is that Spray HTTP, the earlier version of what is now merged
with akka-http, uses something called tcp back-pressure to optimize the number of connections
allowed and the number of simultaneous responses that can be worked on in one connection.
This accounts for how many cores and therefore threads are available on the machine. The upshot
is that Spray self-optimizes for the max that the machine can handle.

This means that connections can be raised by increasing cores but if you are not at 100% CPU
usage then some other part of the system is probably the bottleneck and that means batch queries
will not help. Only if you see 100% CPU usage on the prediction server is increasing connections
or batch queries going to help.

Remember that for evaluation you are putting the worst case load on the Prediction Server,
worse than any real-world scenario is likely to hit. And it is almost certain that you will
be able to overload the system since sending a query is much much easier than processing it.
So processing the query is the most likely cause of speed limits. 

Therefore, scale the system to the max time you can wait for all responses. Start with the
Prediction Server, either add cores or put a load balancer in so you can have any number of
PS machines. Scale HBase and Elasticsearch in the normal ways using their clustering methods
as you see them start to show max CPU usage. Nothing is stored during a query but disk may
be read so also watch I/O for bottlenecks. This assumes you have a clustered deployment and
can measure each systems’s load independently. If you are all on one machine, good luck
because there are many over-constrained situations where one service grabs resources another
needs causing artificial bottlenecks.

Assuming a clustered environment the system is indefinitely scalable because no state is stored
in PIO, only in scalable services like HBase and Elasticsearch. There are 2 internal queries
for every query you make, one to HBase, and one to ES. Both have been use in massive deployments
and so can handle any load you can define.

So by scaling the PS (vertically or with load balancers) and Hbase and ES (through vertical
or cluster expansion) you should be able to handle as many queries per second as you need.

On Jul 27, 2017, at 9:39 PM, Mattz <> wrote:

Thanks Mars. Looks like the pio eval may not work for my needs (according to Pat) since I
am using UR template. 

And, querying the RESP API in bulk may be limited with the concurrency that the API can handle.
I tested with a reasonably sized machine and this number was not high enough. 

On Thu, Jul 27, 2017 at 11:08 PM, Mars Hall < <>>
Hi Mattz,

Yes, that batch evaluator using `pio eval` is currently the only documented way to run batch

It's also possible to create a custom script that calls the Queries HTTP/REST API to collect
predictions in bulk.

My team has had this need reoccur. So, I implemented a `pio batchpredict` command for PredictionIO,
but it's not yet been merged & released. See the pull request: <>


( <> .. <> )

> On Jul 27, 2017, at 05:25, Mattz < <>>
> Hello,
> I am using the "Universal Recommender" template. Is the below guide current if I want
to create bulk recommendations in batch?
> <>
> Thanks!

View raw message