Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3C7C9200B56 for ; Sat, 16 Jul 2016 07:50:53 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3B0A1160A79; Sat, 16 Jul 2016 05:50:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 83409160A61 for ; Sat, 16 Jul 2016 07:50:52 +0200 (CEST) Received: (qmail 39224 invoked by uid 500); 16 Jul 2016 05:50:48 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 39211 invoked by uid 99); 16 Jul 2016 05:50:48 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Jul 2016 05:50:48 +0000 Received: from mail-yw0-f170.google.com (mail-yw0-f170.google.com [209.85.161.170]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id DBFC91A01BB for ; Sat, 16 Jul 2016 05:50:47 +0000 (UTC) Received: by mail-yw0-f170.google.com with SMTP id v186so58475788ywd.0 for ; Fri, 15 Jul 2016 22:50:47 -0700 (PDT) X-Gm-Message-State: ALyK8tILwiw1v42JCYWMc3WyWuFppwxp/Tkn5yOkz3BP7EZUv8Em09JPgxoTO+p/Ti5+/64QPxbwdyKcvSto8A== X-Received: by 10.13.202.14 with SMTP id m14mr17486225ywd.73.1468648246998; Fri, 15 Jul 2016 22:50:46 -0700 (PDT) MIME-Version: 1.0 References: <292435480.3774125.1468590573909.JavaMail.yahoo.ref@mail.yahoo.com> <292435480.3774125.1468590573909.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <292435480.3774125.1468590573909.JavaMail.yahoo@mail.yahoo.com> From: Venkatesh Seetharam Date: Sat, 16 Jul 2016 05:50:37 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Request for advise, collaboration To: general@incubator.apache.org, Srihari Srinivasan Content-Type: multipart/alternative; boundary=001a114fb54c1000400537ba5126 archived-at: Sat, 16 Jul 2016 05:50:53 -0000 --001a114fb54c1000400537ba5126 Content-Type: text/plain; charset=UTF-8 Hi Hari, I'm on the Apache Falcon PMC and Falcon being a data pipeline management solution for Hadoop, there might be enough interest to explore if we can collaborate either being part of Falcon or a separate project. Can you please elaborate on the scope and if orchestration is part of this? Falcon also integrates with a metadata solution in Apache Atlas which I'm part of as well. Thanks! Venkatesh On Fri, Jul 15, 2016 at 6:49 AM Srihari Srinivasan wrote: > Hi Folks, > I am Hari, a developer with a company called ThoughtWorks. We've been > developing data pipelines using on Hadoop,Spark etc for a while now. From > our experiences with different customers we've noticed a recurring need to > carry out tasks such as data preparation, data anonymization etc on large > datasets using Java MR and Spark.Based on this experience, we have been > working on building a couple of libraries targeted at data preparation and > data protection to begin with. Its hosted under an umbrella project > called Data Commons at the moment (inspired by the Apache Commons project > which is organized around a similar theme). > At the moment this is a fledgling project and its contributions are driven > by our data team. However we are very keen on making this part of the > larger Apache collective and make it a community driven effort. > Hence, I am reaching out to you folks for advise on what could be the best > way forward for this effort. We are also open to explore collaborations > with other existing projects that are already part of Apache. Please share > your thoughts, advise. > -- Hari > > --001a114fb54c1000400537ba5126--