predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Chan <kenn...@apache.org>
Subject Re: Kafka support as Event Store
Date Sat, 08 Jul 2017 08:10:37 GMT
thomas, if you always your own data store backend for events or your
application data, you don't have to use EventServer. Just implement a PIO
engine which directly reads from your data source.


On Thu, Jul 6, 2017 at 7:32 AM, Pat Ferrel <pat@occamsmachete.com> wrote:

> The EventStore is queryable in flexible ways and so needs to have random
> access and indexing of events. In short it needs to be backed by a DB.
>
> There are ML algorithms that do not need stored events like online
> learners in the Kappa style. But existing Templates and the PIO
> architecture do not support Kappa yet. So unless you are developing your
> own Template you will find that Kafka will not fill all the requirements.
>
> It is, however an excellent way to manage streams so I tend to see it as a
> source for events and have seen it used as such.
>
> To be more precise about why your method may not work is that not all
> events can be aged out. The item property events in PIO $set, $unset, and
> $delete are embedded in the stream but are only looked at in aggregate
> since they record a set of changes to an object. If you drop one of these
> events the current state of the object can not be calculated with certainty.
>
> To truly use a streaming model we have to introduce the idea of watermarks
> that snapshot state in the stream so that any event can be dropped. This is
> how kappa learners work but for existing Lambda learners is often not
> possible. Many of the Lambda learners can be converted but not easily and
> not all.
>
>
> On Jul 6, 2017, at 1:59 AM, Thomas POCREAU <thomas.pocreau@iadvize.com>
> wrote:
>
> After some talks in intern, I misunderstood our needs.
> Indeed, we will use Kafka HDFS connector
> <http://docs.confluent.io/current/connect/connect-hdfs/docs/index.html>
> to dump expired data.
> So we will basically have Kafka for the fresh events and HDFS for the past
> events.
>
>
> 2017-07-06 7:06 GMT+02:00 Thomas POCREAU <thomas.pocreau@iadvize.com>:
>
>> Hi,
>>
>> Thanks for your responses.
>>
>> Our goal is to use kafka as our main event store for event sourcing. I'm
>> pretty sure that kafka can be used with an infinite retention time.
>>
>> We could use KStream and the Java sdk but I would like to give a try to
>> an implementation of PStore on top of spark-streaming-kafka (
>> https://spark.apache.org/docs/latest/streaming-kafka-0-10-
>> integration.html).
>>
>> My main concern is related to the http interface to push events. We will
>> probably use KStream or websockets to load events in our Kafka topic used
>> by an app channel.
>>
>> Are you planning on supporting websockets as an alternative to batch
>> import?
>>
>> Regards,
>> Thomas Pocreau
>>
>> Le 5 juil. 2017 21:36, "Pat Ferrel" <pat@occamsmachete.com> a écrit :
>>
>>> No, we try not to fork :-) But it would be nice as you say. It can be
>>> done with a small intermediary app that just reads from a Kafka topic and
>>> send events to a localhost EventServer, which would allow events to be
>>> custom extracted from say log files (typical contents of Kafka). We’ve done
>>> this in non-PIO projects.
>>>
>>> The intermediary app should use Spark streaming. I may have a snippet of
>>> code around if you need it but it just saves to micro-batch files. You’d
>>> have to use the PIO Java-SDK to send them to the EventServer. A relatively
>>> simple thing.
>>>
>>> Donald, what did you have in mind for deeper integration? I guess we
>>> could cut out the intermediate app and integrate into a new Kafka aware
>>> EventServer endpoint where the raw topic input is stored in the EventStore.
>>> This would force any log filtering onto the Kafka source.
>>>
>>>
>>> On Jul 5, 2017, at 10:20 AM, Donald Szeto <donald@apache.org> wrote:
>>>
>>> Hi Thomas,
>>>
>>> Supporting Kafka is definitely interesting and desirable. Are you
>>> looking to sinking your Kafka messages to event store for batch processing,
>>> or stream processing directly from Kafka? The latter would require more
>>> work because Apache PIO does not yet support streaming properly.
>>>
>>> Folks from ActionML might have a flavor of PIO that works with Kafka.
>>>
>>> Regards,
>>> Donald
>>>
>>> On Tue, Jul 4, 2017 at 8:34 AM, Thomas POCREAU <
>>> thomas.pocreau@iadvize.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks a lot for this awesome project.
>>>>
>>>> I have a question regarding Kafka and it's possible integration as an
>>>> Event Store.
>>>> Do you have any plan on this matter ?
>>>> Are you aware of someone working on a similar sujet ?
>>>>
>>>> Regards,
>>>> Thomas.
>>>>
>>>>
>>>
>>>
>
>

Mime
View raw message