gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: Sink/source tickets
Date Thu, 05 May 2016 22:41:57 GMT
Hi Kam and others,

Do you think it makes sense to utilize kafka-connect
<http://docs.confluent.io/2.0.0/connect/connectors.html> for source/sink ?
The topology would be like source ~> KafkaSource ~> DAG ~> KafkaSink ~>
sink.
One benefit is we always get at-least-once delivery provided by the current
KafkaSource.
Kafka provides HDFS and JDBC connector out of box and other connectors are
being contributed by the community
<https://github.com/search?p=1&q=kafka-connect&type=Repositories&utf8=%E2%9C%93>
.

On Thu, May 5, 2016 at 11:35 PM Kam Kasravi <kamkasravi@gmail.com> wrote:

> Hi Karol
>
> Good feedback, I'm not sure if GEARPUMP-116 would allow easy integration of
> Redis, JMS, AMQP
> from beam and akka-stream perspectives. Huafeng, Manu?
>
>
> On Wed, May 4, 2016 at 10:34 AM, Karol Brejna <karol.brejna@gmail.com>
> wrote:
>
> > We have a series of jira tickets regarding Gearpump sinks/sources:
> >
> > https://issues.apache.org/jira/browse/GEARPUMP-116 - Compatibility
> > layer/adapter for Apache Storm
> > https://issues.apache.org/jira/browse/GEARPUMP-115 - Create MQTT
> > source/sink
> > https://issues.apache.org/jira/browse/GEARPUMP-106 - Gearpump Redis
> > Integration
> > https://issues.apache.org/jira/browse/GEARPUMP-105 - Provide
> > non-persistent
> > Sink Task so that examples like word count can materialize Sum results
> > within the Client
> > https://issues.apache.org/jira/browse/GEARPUMP-100 - Source task that
> > emits
> > messages per a schedule (interval or otherwise) should be provided
> > https://issues.apache.org/jira/browse/GEARPUMP-95 - Add parquet
> datasource
> > and datasink connectors
> > https://issues.apache.org/jira/browse/GEARPUMP-91 - Apache Cassandra
> > Integration
> >
> > We also had a ticket for 'Add a HDFS Sink with secutiry' (
> > https://github.com/gearpump/gearpump/issues/1547) - I am not sure as for
> > the outcome of this one.
> >
> > Most of them consider the medium (MQTT, Redis, Casandra, ...). Other talk
> > about the source mechanics (scheduled/repetative source).
> >
> > I'd like to discuss the order in wich we plan implementation for them.
> >
> > In my opinion Redis an MQTT (GEARPUMP-106, GEARPUMP-115) seems most
> > important to have.
> > Redis is well known and widely used. MQTT is a de facto standard in IoT
> > communications.
> >
> > Then I would like to have HDFS sink (if we didn't merged this already).
> >
> > Non-persistent datasink could be very useful for examples/demo purposes.
> > (Imagine we have capped collection that the application can send messages
> > to, kind of application console. In the dashboard there could be a
> section
> > that presents lates 'console' messages. This way a user could "watch" the
> > application progress. Especially if he/she doesn't have access to the
> > backend - as it happens often in YARN mode. But this is a topic for
> > dedicated discussion, I think.)
> >
> > On the other hand, if we start working on GEARPUMP-116, we'd probably
> > quickly have Redis, JMS, AMQP sources (adapted from Storm)
> >
> > Please, let me know what do you think.
> >
> > Karol
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message