From user-return-1908-archive-asf-public=cust-asf.ponee.io@predictionio.apache.org  Tue Apr 24 19:16:21 2018
Return-Path: <user-return-1908-archive-asf-public=cust-asf.ponee.io@predictionio.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id B882F180671
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 24 Apr 2018 19:16:20 +0200 (CEST)
Received: (qmail 39361 invoked by uid 500); 24 Apr 2018 17:16:19 -0000
Mailing-List: contact user-help@predictionio.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@predictionio.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@predictionio.apache.org>
List-Post: <mailto:user@predictionio.apache.org>
List-Id: <user.predictionio.apache.org>
Reply-To: user@predictionio.apache.org
Delivered-To: mailing list user@predictionio.apache.org
Received: (qmail 39351 invoked by uid 99); 24 Apr 2018 17:16:19 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Apr 2018 17:16:19 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3EB22C10D2
	for <user@predictionio.apache.org>; Tue, 24 Apr 2018 17:16:19 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.99
X-Spam-Level: *
X-Spam-Status: No, score=1.99 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001,
	T_DKIMWL_WL_MED=-0.01, UNPARSEABLE_RELAY=0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key)
	header.d=occamsmachete-com.20150623.gappssmtp.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id o7cUnu0oT7Qc for <user@predictionio.apache.org>;
	Tue, 24 Apr 2018 17:16:16 +0000 (UTC)
Received: from mail-ot0-f177.google.com (mail-ot0-f177.google.com [74.125.82.177])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0656F5FC42
	for <user@predictionio.apache.org>; Tue, 24 Apr 2018 17:16:16 +0000 (UTC)
Received: by mail-ot0-f177.google.com with SMTP id h8-v6so18428557otb.2
        for <user@predictionio.apache.org>; Tue, 24 Apr 2018 10:16:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=occamsmachete-com.20150623.gappssmtp.com; s=20150623;
        h=from:in-reply-to:references:mime-version:date:message-id:subject:to;
        bh=XF9fzjI7f9p+ImiDmy1QChSxEgcQVxOaRovCY1JMmZE=;
        b=sYuztlEOGTxz5SxDOHoYxWe+Oq9BzNCaV+pSaCfOCbnAWeOYsrGYniOXeCX07DxUNR
         1DihkibRdDqzwPvzWWCB6jQryePWoElBGDAm4ucjNkbLXxYEQBOzguyocfuPDakrfQQv
         MgNWxJ1ryWqjh1I6UY5x49o6ElgcVUuI6NVUwSohu1mnfjc8CO1D0hxzvAFSvY4z0JNP
         9aoSDOLh8eY4aDoa1dEFW6+L1Aae9mSeBJ/M+8NKtZ777qEBKUM2rw6w4qykCG/2Rj0E
         oeY96yESGO3AZ91nDF/CSHgGHG7obFZBC07hHS4VDnjkQeIOHj0tcIxIrsQsgsIQQ+jU
         MQyw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:in-reply-to:references:mime-version:date
         :message-id:subject:to;
        bh=XF9fzjI7f9p+ImiDmy1QChSxEgcQVxOaRovCY1JMmZE=;
        b=a8s1+AwzFi9/LsWS2W6eImtBRZljCaJH4XbQ9Gf50HfsGzmONJXHBwR0ssdD6wvieN
         LpjwcBXXpIxJwFS9xz8VIEft+Y4xyA5oDhWqx7pR+Z4HIBiBgedbOs/eqpgvCarVYstj
         UkrLAOgDfvGvz65nakGeCoiuiDYKSropVL8e2/t2xdcpWbp7KDqB/bR4Ik6JOW+Zc8bX
         X1VJtAT1aRjVaLJtksVoxwd2357jjhU8I4wbj9mNjepMHlSjUJBMp28utx+h7Gtw7sM5
         yc6BCRCD69zdQE7Rn0luDLZqjNdmowelINVBMEcx0LV6lumn3nMSn/jxQPjLjo+j+FYh
         N04A==
X-Gm-Message-State: ALQs6tBNpihl+lN6m6bHiAbnNsmN1EphLiElOnrVtQ2EF5RKmiP8mMpx
	ZwcnVxAUz1WuiPgyDKyCxiyHBwNIfIFwLrT5ieAwMg==
X-Google-Smtp-Source: AIpwx48xlknD1nUaYYtiKhPCj6VhsUKxnXcl0aJBO+vOdOxJb1MTdm+4jZ16Pxqrt8MUKmf0oU5jO+wgtXP4WTRc6sQ=
X-Received: by 2002:a9d:290a:: with SMTP id d10-v6mr16099844otb.334.1524590169050;
 Tue, 24 Apr 2018 10:16:09 -0700 (PDT)
Received: from 1058052472880 named unknown by gmailapi.google.com with
 HTTPREST; Tue, 24 Apr 2018 10:16:08 -0700
From: Pat Ferrel <pat@occamsmachete.com>
In-Reply-To: <BN6PR18MB149082AEFA6656D6D5D95EC0DC880@BN6PR18MB1490.namprd18.prod.outlook.com>
References: <BN6PR18MB149082AEFA6656D6D5D95EC0DC880@BN6PR18MB1490.namprd18.prod.outlook.com>
X-Mailer: Airmail (467)
MIME-Version: 1.0
Date: Tue, 24 Apr 2018 10:16:08 -0700
Message-ID: <CAOtZQD8n68KdPk_1kf0cyeh8dfAQZ9z4GpFyk8vW=FUMfu-6wA@mail.gmail.com>
Subject: Re: Info / resources for scaling PIO?
To: user@predictionio.apache.org, Adam Drew <adamrdrew@live.com>
Content-Type: multipart/alternative; boundary="00000000000074645f056a9b4fb4"

--00000000000074645f056a9b4fb4
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

PIO is based on the architecture of Spark, which uses HDFS. HBase also uses
HDFS. Scaling these are quite well documented on the web. Scaling PIO is
the same as scaling all it=E2=80=99s services. It is unlikely you=E2=80=99l=
l need it but
you can also have more than one PIO server behind a load balancer.

Don=E2=80=99t use local models, put them in HDFS. Don=E2=80=99t mess with N=
FS, it is not
the design point for PIO. Scaling Spark beyond one machine will require
HDFS anyway so use it.

I also advise against using ES for all storage. 4 things hit the event
storage, incoming events (input), training, where all events are read out
at high speed, optionally model storage (depending on the engine) and
queries usually hit the event storage. This will quickly overload one
service and ES is not built as an object retrieval DB. The only reason to
use ES for all storage is that it is convenient when doing development or
experimenting with engines. In production it would be risky to rely on ES
for all storage and you would still need to scale out Spark and therefore
HDFS.

There is a little written about various scaling models here:
http://actionml.com/docs/pio_by_actionml the the architecture and workflow
tab and there are a couple system install docs that cover scaling.


From: Adam Drew <adamrdrew@live.com> <adamrdrew@live.com>
Reply: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Date: April 24, 2018 at 7:37:35 AM
To: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Subject:  Info / resources for scaling PIO?

Hi all!


Is there any info on how to scale PIO to multiple nodes? I=E2=80=99ve gone =
through
a lot of the docs on the site and haven=E2=80=99t found anything. I=E2=80=
=99ve tested PIO
running with HBASE and ES for metadata and events, and with using just ES
for both (my preference thusfar) and have my models on local storage. Would
scaling simply be a matter of deploying clustered ES, and then finding some
way to share my model storage, such as NFS or HDFS? The question then is
what (if anything) has to be done for the nodes to =E2=80=9Cknow=E2=80=9D a=
bout changes on
other nodes. For example, if the model gets trained on node A does node B
automatically know about that?


I hope that makes sense. I=E2=80=99m coming to PIO with no prior experience=
 for the
underlying apache bits (spark, hbase / hdfs, etc) so there=E2=80=99s likely=
 things
I=E2=80=99m not considering. Any help / docs / guidance is appreciated.


Thanks!

Adam

--00000000000074645f056a9b4fb4
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style>=
</head><body style=3D"word-wrap:break-word;line-break:after-white-space"><d=
iv id=3D"bloop_customfont" style=3D"font-family:Helvetica,Arial;font-size:1=
3px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">PIO is based on the =
architecture of Spark, which uses HDFS. HBase also uses HDFS. Scaling these=
 are quite well documented on the web. Scaling PIO is the same as scaling a=
ll it=E2=80=99s services. It is unlikely you=E2=80=99ll need it but you can=
 also have more than one PIO server behind a load balancer.</div><div id=3D=
"bloop_customfont" style=3D"font-family:Helvetica,Arial;font-size:13px;colo=
r:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div id=3D"bloop_c=
ustomfont" style=3D"font-family:Helvetica,Arial;font-size:13px;color:rgba(0=
,0,0,1.0);margin:0px;line-height:auto">Don=E2=80=99t use local models, put =
them in HDFS. Don=E2=80=99t mess with NFS, it is not the design point for P=
IO. Scaling Spark beyond one machine will require HDFS anyway so use it.</d=
iv><div id=3D"bloop_customfont" style=3D"font-family:Helvetica,Arial;font-s=
ize:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><div =
id=3D"bloop_customfont" style=3D"font-family:Helvetica,Arial;font-size:13px=
;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">I also advise against u=
sing ES for all storage. 4 things hit the event storage, incoming events (i=
nput), training, where all events are read out at high speed, optionally mo=
del storage (depending on the engine) and queries usually hit the event sto=
rage. This will quickly overload one service and ES is not built as an obje=
ct retrieval DB. The only reason to use ES for all storage is that it is co=
nvenient when doing development or experimenting with engines. In productio=
n it would be risky to rely on ES for all storage and you would still need =
to scale out Spark and therefore HDFS.</div><div id=3D"bloop_customfont" st=
yle=3D"font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);mar=
gin:0px;line-height:auto"><br></div><div id=3D"bloop_customfont" style=3D"f=
ont-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;=
line-height:auto">There is a little written about various scaling models he=
re:=C2=A0<a href=3D"http://actionml.com/docs/pio_by_actionml">http://action=
ml.com/docs/pio_by_actionml</a>=C2=A0the the architecture and workflow tab =
and there are a couple system install docs that cover scaling.</div> <br> <=
div id=3D"bloop_sign_1524589169741516032" class=3D"bloop_sign"></div> <div =
class=3D"airmail_ext_on" style=3D"color:black"><br>From:=C2=A0<span style=
=3D"color:black">Adam Drew</span> <a href=3D"mailto:adamrdrew@live.com">&lt=
;adamrdrew@live.com&gt;</a><br>Reply:=C2=A0<span style=3D"color:black"><a h=
ref=3D"mailto:user@predictionio.apache.org">user@predictionio.apache.org</a=
></span> <a href=3D"mailto:user@predictionio.apache.org">&lt;user@predictio=
nio.apache.org&gt;</a><br>Date:=C2=A0<span style=3D"color:black">April 24, =
2018 at 7:37:35 AM</span><br>To:=C2=A0<span style=3D"color:black"><a href=
=3D"mailto:user@predictionio.apache.org">user@predictionio.apache.org</a></=
span> <a href=3D"mailto:user@predictionio.apache.org">&lt;user@predictionio=
.apache.org&gt;</a><br>Subject:=C2=A0<span style=3D"color:black"> Info / re=
sources for scaling PIO? <br></span></div><br> <blockquote type=3D"cite" cl=
ass=3D"clean_bq"><span><div lang=3D"EN-US" link=3D"blue" vlink=3D"#954F72">=
<div></div><div>


<div class=3D"WordSection1">
<p class=3D"MsoNormal">Hi all!</p>
<p class=3D"MsoNormal">=C2=A0</p>
<p class=3D"MsoNormal">Is there any info on how to scale PIO to multiple no=
des? I=E2=80=99ve gone through a lot of the docs on the site and haven=E2=
=80=99t found anything. I=E2=80=99ve tested PIO running with HBASE and ES f=
or metadata and events, and with using just ES for both (my
 preference thusfar) and have my models on local storage. Would scaling sim=
ply be a matter of deploying clustered ES, and then finding some way to sha=
re my model storage, such as NFS or HDFS? The question then is what (if any=
thing) has to be done for the nodes
 to =E2=80=9Cknow=E2=80=9D about changes on other nodes. For example, if th=
e model gets trained on node A does node B automatically know about that?</=
p>
<p class=3D"MsoNormal">=C2=A0</p>
<p class=3D"MsoNormal">I hope that makes sense. I=E2=80=99m coming to PIO w=
ith no prior experience for the underlying apache bits (spark, hbase / hdfs=
, etc) so there=E2=80=99s likely things I=E2=80=99m not considering. Any he=
lp / docs / guidance is appreciated.</p>
<p class=3D"MsoNormal">=C2=A0</p>
<p class=3D"MsoNormal">Thanks!</p>
<p class=3D"MsoNormal">Adam </p>
</div>


</div></div></span></blockquote></body></html>

--00000000000074645f056a9b4fb4--