predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Ziemiński <ziem...@gmail.com>
Subject Re: event server - apache nifi & spark data set
Date Mon, 03 Oct 2016 13:32:16 GMT
With Spark 2.0 Dataframes are a special case of Datasets, so every problem
applying to the latter applies also to the former.
PredictionIO is built around RDDs, but it doesn't stop you from using
Dataframes internally in your engine. By defining custom types in DASE
architecture of your engine, you should be able to utilize Dataframes
(Datasets with Spark 2.0 introduced by PR mentioned earlier).
However, trying to access PEventStore to collect your data you will get
RDDs, which you would have to convert to Dataframes if necessary.

niedz., 2.10.2016 o 11:12 użytkownik Georg Heiler <georg.kf.heiler@gmail.com>
napisał:

> Thanks.
> After looking around some more I realized that most engines are using RDD
> and not data frames.
> Is there a similar limitation as for  datasets?
>
> Regards,
> Georg
>
> Marcin Ziemiński <zieminm@gmail.com> schrieb am Fr., 30. Sep. 2016 um
> 20:14 Uhr:
>
> So this is the mentioned PR:
> https://github.com/apache/incubator-predictionio/pull/295
>
> I am aware this is not enough, but this is a necessary step towards
> bringing desired changes.
>
> Best regards,
> Marcin
>
> pt., 30.09.2016 o 19:50 użytkownik Georg Heiler <georg.kf.heiler@gmail.com>
> napisał:
>
> Thanks.
> So a simple recompile for scala 2.11 and upgrade of the spark dependencies
> would not be enough.
>
> Would you mind sharing this pull request. I can't seem to find it via
> Google.
> Thanks again.
> Regards Georg
> Marcin Ziemiński <zieminm@gmail.com> schrieb am Fr. 30. Sep. 2016 um
> 18:05:
>
> Hi Georg,
>
> There is currently no support for Apache NiFi integration in the project.
> I have personally been looking closer at NiFi recently and it seems like a
> good idea to glue it with PIO.
> PredictionIO is now in the stage of Apache incubation and the future
> releases after 0.10 will show more new functionality. If you have any ideas
> how it could look like, please feel free to share your conceptions. This is
> actually a very good moment to bring up such issues.
>
> As far as Datasets are concerned, PIO does not currently support Datasets
> in its API. There is currently a pull request with an update to Spark 2.0,
> so Datasets could be used internally in engines once this is merged, but
> the API doesn't reflect such changes now.
>
> Regards,
> Marcin
>
> pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <georg.kf.heiler@gmail.com>
> napisał:
>
> Hi,
>
> does the event server of PIO integrate with apache nifi?
>
> In the examples you use the spark RDD api. Does PIO support sparks 2.0`s
> datasets as well?
>
> regards,
> Georg
>
>

Mime
View raw message