From user-return-1780-archive-asf-public=cust-asf.ponee.io@predictionio.apache.org Tue Jan 30 19:33:52 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 80DEF18061A for ; Tue, 30 Jan 2018 19:33:52 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 706DC160C53; Tue, 30 Jan 2018 18:33:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1F30C160C2A for ; Tue, 30 Jan 2018 19:33:50 +0100 (CET) Received: (qmail 25052 invoked by uid 500); 30 Jan 2018 18:33:50 -0000 Mailing-List: contact user-help@predictionio.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@predictionio.apache.org Delivered-To: mailing list user@predictionio.apache.org Received: (qmail 25042 invoked by uid 99); 30 Jan 2018 18:33:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jan 2018 18:33:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C22111A44D4 for ; Tue, 30 Jan 2018 18:33:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 6.099 X-Spam-Level: ****** X-Spam-Status: No, score=6.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, URIBL_SBL=4, URIBL_SBL_A=0.1] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=occamsmachete-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id EOarU4PeCkwl for ; Tue, 30 Jan 2018 18:33:47 +0000 (UTC) Received: from mail-pf0-f176.google.com (mail-pf0-f176.google.com [209.85.192.176]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A7FE05F30C for ; Tue, 30 Jan 2018 18:33:46 +0000 (UTC) Received: by mail-pf0-f176.google.com with SMTP id e76so9907708pfk.1 for ; Tue, 30 Jan 2018 10:33:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=occamsmachete-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:message-id:in-reply-to:references:subject:mime-version; bh=MDeZ6QhFZUQcCcNNn9IEDo6kl/ISanZaI+DcarqqI90=; b=u69e620+zdLM/Om8I4hiPjKIVxBNGaHXCqWGwVN5xqA14WYg5Klu2Mk4XQo6NMXvDi 5XlADcJmeiNEyL/Lds3EEZ5hr23uPQPh1B26JB2EuR8fGZOLBW0aMLjzSq/HxOsS0VI3 MwhxoHx/Im0CLL2iiEeUKClNyXLRe8p4aEvJwVPw8HbpGd7NMZ/b2VMEcEkiZ5iAesu1 Yq6Am4syk4NQlydk+5F7JzTv99y64Cx7e9ialr+G5MfSF3BVnF/4QPpWI4UA7fpgiiRb rivm6CRGVtdQ27WpGV0OXmjuZFLzHz+GXGOy0dVv/LasmxjDJJA2/rGsenccsZIj4DpT yrnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:message-id:in-reply-to:references :subject:mime-version; bh=MDeZ6QhFZUQcCcNNn9IEDo6kl/ISanZaI+DcarqqI90=; b=Rx9pIjMRqwsgrmul+iuyujbviLO8mzF+4U4bOPn2ojcy5vuZjePj/dRcW+kxEVagao KGNXEwQBFa1QnvBR27DK/44zuvYcMtTOtzFS5CASIADUxG+2hm4xhl8tPfVbwzTpaAfc Bxq2d4GyyDdA4dK4f3k8OXo4WT5f7TX0exvYrnh5JchgfbUlfvTywIRimpDF2yNT5pLQ Oz6jMWvH+TEApVu1bwG55/rfqcyUmBdv7dkX1x3ECFZCnyuQjGWnWRP1NwMtpjJyNFqK sKNEhTRlptazq2txPNXr12yddYmG/4Ceuy4R037c9W1/BweUBpyblB70SECQ/eKbzasO Morg== X-Gm-Message-State: AKwxytcy228BKqzl89LwVKEuPpaWt1J5cCA/0e0/hQo3hCyhUx/6+6dn jSXvj6+MwuwVTiibd7wHdwMkv1c7THs= X-Google-Smtp-Source: AH8x225oOb0gCt8exibc3T/xcEnu92ZtBD7Ui7jfVpmaK38dV31zE1F9KtWsUdVEP294gGU74gH0sw== X-Received: by 10.99.122.18 with SMTP id v18mr25465130pgc.128.1517337225199; Tue, 30 Jan 2018 10:33:45 -0800 (PST) Received: from Maclaurin.local.mail (c-24-18-213-211.hsd1.wa.comcast.net. [24.18.213.211]) by smtp.gmail.com with ESMTPSA id g13sm45561882pfe.50.2018.01.30.10.33.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Jan 2018 10:33:44 -0800 (PST) Date: Tue, 30 Jan 2018 10:33:42 -0800 From: Pat Ferrel To: user@predictionio.apache.org, Daniel O' Shaughnessy Message-ID: In-Reply-To: References: <257AC21D-E819-4DD4-BC36-1BA723C3288E@occamsmachete.com> <5801F595-430F-438A-A61C-8928803B23FC@occamsmachete.com> Subject: Re: Using Dataframe API vs. RDD API? X-Mailer: Airmail (467) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="5a70ba86_4dce985c_1a2" --5a70ba86_4dce985c_1a2 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline What template are you using=3F If it is one of the templates in the Apach= e repos, you may want to file a bug report. If PIO supports Spark 2.x, th= e Apache Templates should also IMHO. =46rom:=C2=A0Daniel O' Shaughnessy Reply:=C2=A0user=40predictionio.apache.org Date:=C2=A0January 30, 2018 at 9:09:49 AM To:=C2=A0user=40predictionio.apache.org Subject:=C2=A0 Re: Using Dataframe API vs. RDD API=3F =20 Hi Shane, You need to use PAlgorithm instead of P2Algorithm and save/load the spark= context accordingly. This way you can use spark context in the predict f= unction. There are examples of using PAlgorithm on the predictionio Site. It=E2=80= =99s slightly more complicated but not too bad=21 On Tue, 30 Jan 2018 at 17:06, Shane Johnson wrote: Thanks team=21 We are close to having our models working with the Datafra= me API. One additional roadblock we are hitting is the fundamental differ= ence in the RDD based API vs the Dataframe API. It seems that the old mll= ib API would allow a simple vector to get predictions where in the new ml= API a dataframe is required. This presents a challenge as the predict fu= nction in PredictionIO does not have a spark context.=C2=A0 Any ideas how to overcome this=3F Am I thinking through this correctly or= are there other ways to get predictions with the new ml Dataframe API wi= thout having a dataframe as input=3F Best, Shane Shane Johnson =7C 801.360.3350 LinkedIn=C2=A0=7C=C2=A0=46acebook 2018-01-08 20:37 GMT-10:00 Donald Szeto : We do have work-in-progress for Data=46rame API tracked at=C2=A0https://i= ssues.apache.org/jira/browse/PIO-71. Chan, it would be nice if you could create a branch on your personal fork= if you want to hand it off to someone else. Thanks=21 On =46ri, Jan 5, 2018 at 2:02 PM, Pat =46errel = wrote: Yes and I do not recommend that because the EventServer schema is not a d= eveloper contract. It may change at any time. Use the conversion method a= nd go through the PIO API to get the RDD then convert to D=46 for now. I=E2=80=99m not sure what PIO uses to get an RDD from Postgres but if the= y do not use something like the lib you mention, a PR would be nice. Also= if you have an interest in adding the D=46 APIs to the EventServer contr= ibutions are encouraged. Committers will give some guidance I=E2=80=99m s= ure=E2=80=94once that know more than me on the subject. If you want to donate some D=46 code, create a Jira and we=E2=80=99ll eas= ily find a mentor to make suggestions. There are many benefits to this in= cluding not having to support a fork of PIO through subsequent versions. = Also others are interested in this too. =C2=A0 On Jan 5, 2018, at 7:39 AM, Daniel O' Shaughnessy wrote: ....Should have mentioned that I used =20 org.apache.spark.rdd.JdbcRDD=C2=A0to read in the RDD from a postgres DB i= nitially. This was you don't need to use an EventServer=21 On =46ri, 5 Jan 2018 at 15:37 Daniel O' Shaughnessy wrote: Hi Shane,=C2=A0 I've successfully used :=C2=A0 import =20 org.apache.spark.ml.classification.=7B =20 Random=46orestClassificationModel, =20 Random=46orestClassifier =20 =7D with pio. You can access feature importance through the Random=46orestCla= ssifier also. Very simple to convert RDDs to D=46s as Pat mentioned, something like: val RDD=5F2=5FD=46 =3D sqlContext.createData=46rame(yourRDD).toD=46(=22col1=22, =22col2=22) On Thu, 4 Jan 2018 at 23:10 Pat =46errel wrote:= Actually there are libs that will read D=46s from HBase=C2=A0https://svn.= apache.org/repos/asf/hbase/hbase.apache.org/trunk/=5Fchapters/spark.html This is out of band with PIO and should not be used IMO because the schem= a of the EventStore is not guaranteed to remain as-is. The safest way is = to translate or get D=46s integrated to PIO. I think there is an existing= Jira that request Spark ML support, which assumes D=46s.=C2=A0 On Jan 4, 2018, at 12:25 PM, Pat =46errel wrote= : =46unny you should ask this. Yes, we are working on a D=46 based Universa= l Recommender but you have to convert the RDD into a D=46 since PIO does = not read out data in the form of a D=46 (yet). This is a fairly simple st= ep of maybe one line of code but would be better supported in PIO itself.= The issue is that the EventStore uses libs that may not read out D=46s, = but RDDs. This is certainly the case with Elasticsearch, which provides a= n RDD lib. I haven=E2=80=99t seen one from them that read out D=46s thoug= h it would make a lot of sense for ES especially. So TLDR; yes, just convert the RDD into a D=46 for now. Also please add a feature request as a PIO Jira ticket to look into this.= I for one would +1 On Jan 4, 2018, at 11:55 AM, Shane Johnson wrote: Hello group, Happy new year=21 Does anyone have a working example or temp= late using the Data=46rame API vs. the RDD based APIs. We are wanting to = migrate to using the new Data=46rame APIs to take advantage of the =46eat= ure Importance function for our Regression Random =46orest Models. We are wanting to move from=C2=A0 import org.apache.spark.mllib.tree.Random=46orest import org.apache.spark.mllib.tree.model.Random=46orestModel import org.apache.spark.mllib.util.MLUtils to import org.apache.spark.ml.regression.=7BRandom=46orestRegressionModel, R= andom=46orestRegressor=7D Is this something that should be fairly straightforward by adjusting para= meters and calling new classes within DASE or is it much more involved de= velopment. Thank You=21 Shane Johnson =7C 801.360.3350 LinkedIn=C2=A0=7C=C2=A0=46acebook --5a70ba86_4dce985c_1a2 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline