flume-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: [Discuss graph source/sink design proposal]
Date Mon, 04 Jul 2016 19:03:06 GMT
Hi Saikat,
I recommend you use GitHub. Private branches in ASF repos are only available to committers.

Regarding forking Flume, you should not need to do that. Just depend on flume-ng-core in your
pom and extend AbstractSink. Maven will pull in your deps.

I'm out of town for the next few days but I'll try to respond in more detail to your design
notes when I'm back in town.


Sent from my iPhone

> On Jul 4, 2016, at 6:59 AM, Saikat Kanjilal <sxk1969@hotmail.com> wrote:
> Hari/Mike et al,
> I need a place to put interim checkins related to this work, is it possible to get write
privileges into a private branch so that I can commit my code at intermediate junctures, I
can also put it in bitbucket but would rather not have to create yet another place for the
code to live if it'll eventually end up in the flume repo.
> Thanks in advance
> ________________________________
> From: Saikat Kanjilal <sxk1969@hotmail.com>
> Sent: Thursday, June 30, 2016 10:16 PM
> To: dev@flume.apache.org
> Subject: RE: [Discuss graph source/sink design proposal]
> So I've started the coding efforts on this, here's some details:
> 1) I've cloned the hbase sink for now and am refactoring all of that code to work with
neo4j as a start2) I'm only focusing on creating a sink that will perform basic CRUD streaming
operations into neo4j3) I've sent an email to the neo4j guys to figure out details around
building a streaming architecture with the neo4j kernel4) In the meantime how would you guys
like to review the code, I've cloned the flume repo and have created a branch called flume-2035
where I will work, should I put all the code in bitbucket and send out periodic reviews, this
is going to be a sizeable effort5) How should we think about cipher related workflows as it
relates to the streaming data coming in , to see a ful flavor for cipher go here https://neo4j.com/developer/cypher-query-language/
> Neo4j's Graph Query Language: An Introduction to Cypher<https://neo4j.com/developer/cypher-query-language/>
> neo4j.com
> Master the basics of Cypher – the graph query language for Neo4j – with this introductory
guide that teaches you how to read and write Cypher queries.
> Would love to get some discussion going on 2-5.
> Thanks
>> From: mpercy@apache.org
>> Date: Wed, 29 Jun 2016 17:24:16 -0700
>> Subject: Re: [Discuss graph source/sink design proposal]
>> To: dev@flume.apache.org
>> Hmm, maybe a different Kudu project? Not sure.
>> Anyway, this type of "changelog" thing would require support in the DB for
>> streaming its write-ahead log or something. For example, we don't support
>> that in Apache Kudu (incubating) -- maybe someday.
>> Regarding Flume, I usually think it's useful to distinguish between a
>> source and a sink. They are typically written as separate classes and they
>> represent different interfaces at the Flume Java API level.
>> So, how would one write a streaming database source? That really depends on
>> the database and the APIs it provides for that.
>> Mike
>> On Tue, Jun 28, 2016 at 8:30 AM, Saikat Kanjilal <sxk1969@hotmail.com>
>> wrote:
>>> :) I'm using Kudu at work at the moment to troubleshoot some Tomcat
>>> issues,  regarding the where to keep the source code I would say for now
>>> lets go with the plugin approach and revisit the "where does the code live"
>>> conversation later.  One thing I do want to discuss is that the plugin will
>>> act as a source or a sink depending on configuration, so if the plugin acts
>>> as a source we need a mechanism (like a daemon in syslog) to stream changes
>>> real time from a graphdb into flume, I was wondering if there are any past
>>> approaches around this that I can follow, I may need to dig into the neo4j
>>> kernel to see where we can inject something like this.
>>> Thoughts on that?
>>>> From: mpercy@apache.org
>>>> Date: Tue, 28 Jun 2016 00:27:45 -0700
>>>> Subject: Re: [Discuss graph source/sink design proposal]
>>>> To: dev@flume.apache.org
>>>> Hi Saikat,
>>>> Please see my thoughts inline. This is how I think about this stuff;
>>> others
>>>> may think about it differently.
>>>> On Mon, Jun 27, 2016 at 8:45 PM, Saikat Kanjilal <sxk1969@hotmail.com>
>>>> wrote:
>>>>> Exactly right, I'm proposing we create a graph sink for flume while
>>>>> keeping the flume core intact.
>>>> As you are probably aware, sources and sinks don't have to be part of the
>>>> main Apache Flume source tree to be used with Flume. The plugins.d
>>>> mechanism described in [1] makes building and integrating separate
>>> plugins
>>>> into Flume an easy thing to do at deployment time.
>>>> In another project I work on, Apache Kudu (incubating), we have a Flume
>>>> Kudu sink committed in the main source tree [2]. We may at some point
>>>> propose to move it into the Flume source tree, but for now (for testing
>>> and
>>>> API stability reasons) it's easier to keep it in the Kudu source tree.
>>>> Likewise, you could implement a Flume Neo4J sink and post it up on GitHub
>>>> (or maybe in the Neo4J tree?). Donating it to the Apache Flume project
>>> once
>>>> it's in decent shape may make sense at some point, especially if the
>>>> dependencies are easy to share and integrate into the Flume project.
>>>> However, I wouldn't say that it's a foregone conclusion that it really
>>>> needs to be part of the Flume source tree. Assuming you need the sink,
>>> and
>>>> are going to implement it anyway, then maybe we can defer the discussion
>>> of
>>>> whether to include it in the Flume source tree until later. One of the
>>>> things I try to keep in mind when integrating new plugin code is whether
>>>> the project will be able to support the maintenance burden of the new
>>> code.
>>>> In reading from a graph db we need a mechanism to stream data from the
>>>>> graph store into flume.
>>>> Yes, I'd say it could potentially make sense to create a Flume Neo4J
>>> source
>>>> as well. I think the same logic as above would still apply.
>>>> Regards,
>>>> Mike
>>>> [1]
>>> https://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins
>>>> [2]
>>> https://github.com/apache/incubator-kudu/tree/master/java/kudu-flume-sink

View raw message