From user-return-1871-archive-asf-public=cust-asf.ponee.io@predictionio.apache.org Thu Mar 29 01:40:50 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A7A0F180652 for ; Thu, 29 Mar 2018 01:40:49 +0200 (CEST) Received: (qmail 58170 invoked by uid 500); 28 Mar 2018 23:40:48 -0000 Mailing-List: contact user-help@predictionio.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.apache.org Delivered-To: mailing list user@predictionio.apache.org Received: (qmail 58160 invoked by uid 99); 28 Mar 2018 23:40:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2018 23:40:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 30E56C0385 for ; Wed, 28 Mar 2018 23:40:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.999 X-Spam-Level: * X-Spam-Status: No, score=1.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=occamsmachete-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id XMPyGVpD6aS1 for ; Wed, 28 Mar 2018 23:40:45 +0000 (UTC) Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com [209.85.192.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id EC31D5F17B for ; Wed, 28 Mar 2018 23:40:44 +0000 (UTC) Received: by mail-pf0-f181.google.com with SMTP id a11so1915539pff.8 for ; Wed, 28 Mar 2018 16:40:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=occamsmachete-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:message-id:in-reply-to:references:subject :mime-version; bh=84svbThoUjYz2rB/2eUwC59DLTxPjYZlnc07ciIPUQA=; b=Lnl9qQ6iwZOvV+h5pWapq7Ym/o9x11kFAJaQxKZJCBXWcjb4JyGi9+dQPdtR9dhs0B jE540AvKrD/T0hmftgVdwZudPBI1JSy47px2EOfOpocrCOGjg2oxjryTVtBSZjrYNhG9 kjbUkJ9ldEh0CAp38qDn6cL4DMuwQ/NqWUwztQodUks1K5IP2f2zaZx3dtDTXq0EAnND Qj8sR9NLCRHzWFky7Nw0HB6vQ+7RMHal1poF3rgCPepgGJE6/4gFBItTQTU5V2kn/6U3 tobIToWKNNcZPzOa54UJMXO84+L5hHsc1oRiYkDpZNlE5g3nPcwoT6KzR3N5o3MBIm92 qbSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:message-id:in-reply-to :references:subject:mime-version; bh=84svbThoUjYz2rB/2eUwC59DLTxPjYZlnc07ciIPUQA=; b=im5c/CP498FGnGqHp9F6lqs+PHg/3kqj+1vGuGYZFuVN+Wca9vY0+PckzQKX/hxPtE RFACq06qCEVqaE/5Y+2j3/csEwFE4knL5EBmaw/XIAkNeZveHJgR5qDextb1nwK4sye3 YrLi5j9sC5r7u7i+mSmnuDwrmuYK/JF1J958hgKmkZlt0rFmxIagIg4zBTFMHz2bL0gP cH9SrvlAGcanjb29BUcuzEXr9lc6AZZiiAfTpBidJMOSBiBLN7Zn+SaIklJtxk44QG0T fp458qKbp5hC97wbtq5ud1w/KgUTVBRpCJNjvaGOZG8ytNZp2S72M+GI90Rpwte1wFKQ Zvzw== X-Gm-Message-State: AElRT7FtYrNramElDk+9HB146JWEhudNuFGIjtEFIVvn4+liUsEp2TxS S7ARpZkENF8Aaz09dhiyyCgCgnTkots= X-Google-Smtp-Source: AIpwx4+cLysplTRk5FLag6a/oMlazlATmBDaU3X9BcBfo2OnpZ+xbRKH1saaKL6ke7JY8m88HzSY+g== X-Received: by 10.98.141.20 with SMTP id z20mr4448934pfd.144.1522280443643; Wed, 28 Mar 2018 16:40:43 -0700 (PDT) Received: from Maclaurin.local.mail (c-24-18-213-211.hsd1.wa.comcast.net. [24.18.213.211]) by smtp.gmail.com with ESMTPSA id y1sm8454798pge.78.2018.03.28.16.40.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Mar 2018 16:40:42 -0700 (PDT) Date: Wed, 28 Mar 2018 16:40:41 -0700 From: Pat Ferrel To: user@predictionio.apache.org, Dave Novelli Cc: user@predictionio.apache.org Message-ID: In-Reply-To: References: Subject: Re: Unclear problem with using S3 as a storage data source X-Mailer: Airmail (467) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="5abc27f9_451c118e_164" --5abc27f9_451c118e_164 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Sorry then I don=E2=80=99t understand what part has no access to the file= system on the single machine=3F=C2=A0 Also a t2 is not going to work with PIO. Spark 2 along requires something= like 2g for a do-nothing empty executor and driver, so a real app will r= equire 16g or so minimum (my laptop has 16g). Run the OS, HBase, ES, and = Spark will get you to over 8g, then add data. Spark keeps all data needed= at a given phase of the calculation in memory across the cluster, that=E2= =80=99s where it gets it=E2=80=99s speed. Welcome to big-data :-) =46rom:=C2=A0Dave Novelli Reply:=C2=A0user=40predictionio.apache.org Date:=C2=A0March 28, 2018 at 3:47:35 PM To:=C2=A0Pat =46errel Cc:=C2=A0user=40predictionio.apache.org Subject:=C2=A0 Re: Unclear problem with using S3 as a storage data source= =20 I don't *think* I need more spark nodes - I'm just using the one for trai= ning on an r4.large instance I spin up and down as needed. I was hoping to avoid adding any additional computational load to my Even= t/Prediction/HBase/ES server (all running on a t2.medium) so I am looking= for a way to *not* install HD=46S on there as well. S3 seemed like it wo= uld be a super convenient way to pass the model files back and forth, but= it sounds like it wasn't implemented as a data source for the model repo= sitory for UR. Perhaps that's something I could implement and contribute=3F I can *kinda= * read Scala haha, maybe this would be a fun learning project. Do you thi= nk it would be fairly straightforward=3F Dave Novelli =46ounder/Principal Consultant, Ultraviolet Analytics www.ultravioletanalytics.com =7C 919.210.0948 =7C dave=40ultravioletanaly= tics.com On Wed, Mar 28, 2018 at 6:01 PM, Pat =46errel w= rote: So you need to have more Spark nodes and this is the problem=3F If so setup HBase on pseudo-clustered HD=46S so you have a master node ad= dress even though all storage is on one machine. Then you use that versio= n of HD=46S to tell Spark where to look for the model. It give the model = a URI. I have never used the raw S3 support, HD=46S can also be backed by S3 but= you use HD=46S APIs, it is an HD=46S config setting to use S3. It is a rather unfortunate side effect of PIO but there are 2 ways to sol= ve this with no extra servers.=C2=A0 Maybe someone else knows how to use S3 natively for the model stub=3F =C2=A0 =46rom:=C2=A0Dave Novelli Date:=C2=A0March 28, 2018 at 12:13:12 PM To:=C2=A0Pat =46errel Cc:=C2=A0user=40predictionio.apache.org Subject:=C2=A0 Re: Unclear problem with using S3 as a storage data source= Well, it looks like the local file system isn't an option in a multi-serv= er configuration without manually setting up a process to transfer those = stub model files. I trained models on one heavy-weight temporary instance, and then when I = went to deploy from the prediction server instance it failed due to missi= ng files. I copied the .pio=5Fstore/models directory from the training se= rver over to the prediction server and then was able to deploy. So, in a dual-instance configuration what's the best way to store the fil= es=3F I'm using pseudo-distributed HBase with standard file system storag= e instead of HD=46S (my current aim is keeping down cost and complexity f= or a pilot project). Is S3 back on the table as on option=3F On =46ri, Mar 23, 2018 at 11:03 AM, Dave Novelli wrote: Ahhh ok, thanks Pat=21 Dave Novelli =46ounder/Principal Consultant, Ultraviolet Analytics www.ultravioletanalytics.com =7C 919.210.0948 =7C dave=40ultravioletanaly= tics.com On =46ri, Mar 23, 2018 at 8:08 AM, Pat =46errel = wrote: There is no need to have Universal Recommender models put in S3, they are= not used and only exist (in stub form) because PIO requires them. The ac= tual model lives in Elasticsearch and uses special features of ES to perf= orm the last phase of the algorithm and so cannot be replaced. The stub PIO models have no data and will be tiny. putting them in HD=46S= or the local file system is recommended. =46rom:=C2=A0Dave Novelli Reply:=C2=A0user=40predictionio.apache.org Date:=C2=A0March 22, 2018 at 6:17:32 PM To:=C2=A0user=40predictionio.apache.org Subject:=C2=A0 Unclear problem with using S3 as a storage data source Hi all, I'm using the Universal Recommender template and I'm trying to switch sto= rage data sources from local file to S3 for the model repository. I've re= ad the page at=C2=A0https://predictionio.apache.org/system/anotherdatasto= re/ to try to understand the configuration requirements, but when I run p= io train it's indicating an error and nothing shows up in the s3 bucket:=C2= =A0 =5BERROR=5D =5BS3Models=5D =46ailed to insert a model to s3://pio-model/p= io=5FmodelAWJPjTYM0wNJe2iKBl0d I created a new bucket named =22pio-model=22=C2=A0and granted full public= permissions. Seemingly relevant settings from pio-env.sh: PIO=5FSTORAGE=5FREPOSITORIES=5FMODELDATA=5FNAME=3Dpio=5Fmodel PIO=5FSTORAGE=5FREPOSITORIES=5FMODELDATA=5FSOURCE=3DS3 ... PIO=5FSTORAGE=5FSOURCES=5FS3=5FTYPE=3Ds3 PIO=5FSTORAGE=5FSOURCES=5FS3=5FREGION=3Dus-west-2 PIO=5FSTORAGE=5FSOURCES=5FS3=5FBUCKET=5FNAME=3Dpio-model =23 I've tried with and without this =23PIO=5FSTORAGE=5FSOURCES=5FS3=5FENDPOINT=3Dhttp://s3.us-west-2.amazonaw= s.com =23 I've tried with and without this =23PIO=5FSTORAGE=5FSOURCES=5FS3=5FBASE=5FPATH=3Dpio-model Any suggestions where I can start troubleshooting my configuration=3F Thanks, Dave -- Dave Novelli =46ounder/Principal Consultant, Ultraviolet Analytics www.ultravioletanalytics.com =7C 919.210.0948 =7C dave=40ultravioletanaly= tics.com --5abc27f9_451c118e_164 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline