Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 346F72009D9 for ; Wed, 18 May 2016 02:11:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 32F6B1609F5; Wed, 18 May 2016 00:11:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2BDDD160A1F for ; Wed, 18 May 2016 02:10:59 +0200 (CEST) Received: (qmail 89282 invoked by uid 500); 18 May 2016 00:10:58 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 88554 invoked by uid 99); 18 May 2016 00:10:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2016 00:10:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7A539C0D6D for ; Wed, 18 May 2016 00:10:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id d6qzTfbL2YH5 for ; Wed, 18 May 2016 00:10:54 +0000 (UTC) Received: from mail-yw0-f171.google.com (mail-yw0-f171.google.com [209.85.161.171]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id EAC5D5F36A for ; Wed, 18 May 2016 00:10:53 +0000 (UTC) Received: by mail-yw0-f171.google.com with SMTP id x194so32600606ywd.0 for ; Tue, 17 May 2016 17:10:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Ah/pKaUvNNtFRJNOgvuqpPhotnnMYgXxLMdlb2n88Dc=; b=ZNFDzDkLcdAMOPX2tNJ+91QkM/bzSaAhEd3Dw9h2d3JRvgxnikUzrW7I5HYQT/an+0 jL8KmP5Tb1WhRQT3fMECEug3rx89AjpZ/ReqPmHxWPHgw1/pXUQ5QgY7rZ61yu7Amrwu 3E15JMUo/vGDFbpknHapLpr0syejuJWNx4vtNNqJiB2AizeMVybvyUGoa/RcACge3wXF LgRoptJUYbkZ6aNVaM5TRN3QtKV5EimllF6ZRa1aDgPAWiqvUcU7aBmxS1MjRwuP6Ru6 88DkMtKMVvQ7BPzgOq924d308WmMwVaxLdkeir+UDzZpeAcHuxc6RPhblGPdmPLG6iDO aqSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Ah/pKaUvNNtFRJNOgvuqpPhotnnMYgXxLMdlb2n88Dc=; b=I3AEHpDnthWRN4d4t5bbDLkLToLbjiIUWPpXXNrh/Ri0KrT1EeVCpm/UgovRk4sy5b 0hDche9KgjALBc2yLzF+pZ6BckmbLgNDk4oMeQFIhWSYpEif8glLFLZMVicCPtgnk2GW kioqatL1lvajWNt1qZ9Tye5PpF15uNjSw4wSmtEX/gzdWlKHZFp48Hz2ao1Ispzt8xmv Ges75EviW9U95D+zVg+j4qwR5yWDrU0ujb3SByydOKJpVXEdSLcSjugUsi4r6so6aZpy A72rb0GFbcwZY2C5hZHTOgRbYbx36HrFqyQwOD+DVTo5ImpCD3fPdrPEpeHVXLQTWWMC 9BwQ== X-Gm-Message-State: AOPr4FUsk2Jm1xgpkP5XBfISpRgPqioGwY3tFr5R/UBEQNRtgWZBbHx2Y5uSm+R3fJq6Z7Rmhe50fruiFqo8zg== X-Received: by 10.129.152.8 with SMTP id p8mr2395633ywg.157.1463530252852; Tue, 17 May 2016 17:10:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.21.83 with HTTP; Tue, 17 May 2016 17:10:33 -0700 (PDT) In-Reply-To: References: From: Chandni Singh Date: Tue, 17 May 2016 17:10:33 -0700 Message-ID: Subject: Re: NFS Input Module To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=94eb2c0bc490d6e68a053312b0f1 archived-at: Wed, 18 May 2016 00:11:00 -0000 --94eb2c0bc490d6e68a053312b0f1 Content-Type: text/plain; charset=UTF-8 Hi, I see HDFSFileCopyModule and HDFSFileMerger in the library as well. Since we are so close to the release and I am not sure if these classes are just specific to HDFS, I am going to mark them Evolving so that we can address this afterwards and change the name if its suitable. Thanks, Chandni On Sat, May 7, 2016 at 2:17 PM, Chandni Singh wrote: > I can help Dev. > > Thanks, > Chandni > > On Sat, May 7, 2016 at 1:23 PM, Amol Kekre wrote: > >> We do have docs on apache.org. Love to a very extensive and deep doc on >> this topic. >> >> Should we add "How to ..." sections? >> >> @dev, thks for volunteering. Anyone more volunteers? >> >> Thks, >> Amol >> >> >> On Sat, May 7, 2016 at 12:20 PM, Devendra Tagare < >> devendrat@datatorrent.com> >> wrote: >> >> > @Thomas,@Amol I would like to contribute/collaborate on this. >> > >> > Will create a ticket for the same. >> > >> > Thanks, >> > Dev >> > >> > On Sat, May 7, 2016 at 11:04 AM, Thomas Weise >> > wrote: >> > >> > > The documentation is here and is indexed: >> > > >> > > http://apex.apache.org/docs/malhar/ >> > > >> > > I think this is a matter of enhancing it. >> > > >> > > >> > > On Sat, May 7, 2016 at 9:18 AM, Amol Kekre >> wrote: >> > > >> > > > Thomas and I talked. Both of us agree that a white paper is due to >> get >> > > > going. Google index clearly beats "find . | grep ..." in this day >> and >> > > age. >> > > > >> > > > The white paper would walk through and have data on HDFS, FTP, NFS, >> S3, >> > > > maybe even example apps (could be app properties) accompanying this. >> > > > >> > > > So any volunteers? >> > > > >> > > > Thks >> > > > Amol >> > > > >> > > > >> > > > On Thu, May 5, 2016 at 5:10 PM, Thomas Weise < >> thomas@datatorrent.com> >> > > > wrote: >> > > > >> > > > > Do we have other projects that create dummy classes for every >> > possible >> > > > > mounted file system just so that the user knows that's possible? >> The >> > > > > capability that matters here from app perspective is local file >> > system >> > > > and >> > > > > every developer in the Hadoop ecosystem should understand that. >> > > > > >> > > > > If the operator doesn't have anything specific to NFS then there >> is >> > no >> > > > > place for it in the library (it would be confusing, not helpful). >> > > > > >> > > > > There should be a different approach for pre-configured operators >> > that >> > > > > doesn't involve writing Java code. >> > > > > >> > > > > Thomas >> > > > > >> > > > > >> > > > > >> > > > > On Thu, May 5, 2016 at 3:10 PM, Amol Kekre >> > > wrote: >> > > > > >> > > > > > I am not suggesting duplicating code; extend the operators. Just >> > add >> > > > > > something (may not even be a function) that can be viewed as >> > specific >> > > > to >> > > > > a >> > > > > > particular source. Say for NFS, it may be as simple as changing >> a >> > > > > default. >> > > > > > A file with NFS in its name help a great deal with adoption. >> > > > > > >> > > > > > Thks >> > > > > > Amol >> > > > > > >> > > > > > >> > > > > > On Thu, May 5, 2016 at 11:45 AM, Chandni Singh < >> > > > singh.chandni@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > > > IMO this is not a good idea. >> > > > > > > >> > > > > > > We are proposing to add additional Java code which is generic >> > > (works >> > > > > with >> > > > > > > HDFS, NFS, local FS) but just calling it something specific - >> > NFS. >> > > > IMO >> > > > > > this >> > > > > > > is much more confusing to users. >> > > > > > > >> > > > > > > If we want to make it easier for users to find out that the FS >> > > Module >> > > > > > > supports writing to NFS then maybe we need to improve >> > documentation >> > > > or >> > > > > > > highlight it somewhere else. >> > > > > > > >> > > > > > > Adding java classes means more maintenance overhead and here >> > these >> > > > > > classes >> > > > > > > are not doing anything additional. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Chandni >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Thu, May 5, 2016 at 11:24 AM, Mohit Jotwani < >> > > > mohit@datatorrent.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > +1 on Sandeep's suggestion. This would make an end user's >> life >> > > lot >> > > > > more >> > > > > > > > easier! >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Mohit >> > > > > > > > >> > > > > > > > On Thu, May 5, 2016 at 11:51 PM, Sandeep Deshmukh < >> > > > > > > sandeep@datatorrent.com >> > > > > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > I do agree with Amol on having clear and explicit modules. >> > This >> > > > is >> > > > > > more >> > > > > > > > > from an end user perspective. For someone who is new to >> Apex, >> > > > > having >> > > > > > > > > separate NFS, HDFS, FTP, etc would make lot more sense >> than >> > one >> > > > > > generic >> > > > > > > > FS >> > > > > > > > > module. However small change these modules may have, like >> > just >> > > > > couple >> > > > > > > of >> > > > > > > > > small functions, I would like to have them separate for >> the >> > end >> > > > > user. >> > > > > > > > > >> > > > > > > > > It is finally about the perspective and the user >> experience >> > :) >> > > > > > > > > >> > > > > > > > > Regards, >> > > > > > > > > Sandeep >> > > > > > > > > >> > > > > > > > > On Thu, May 5, 2016 at 8:48 PM, Thomas Weise < >> > > > > thomas@datatorrent.com >> > > > > > > >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > I don't think we should name something NFS* when it >> isn't >> > > > > specific >> > > > > > to >> > > > > > > > > NFS. >> > > > > > > > > > It is just like any other local FS for this purpose and >> > > that's >> > > > > > > already >> > > > > > > > > > covered by the Hadoop file system abstraction. >> > > > > > > > > > >> > > > > > > > > > Why can't a single FS Input module accommodate all of >> this. >> > > > Once >> > > > > > you >> > > > > > > > know >> > > > > > > > > > the FS URL, you can automatically optimize the >> > configuration, >> > > > if >> > > > > > > > > > appropriate. >> > > > > > > > > > >> > > > > > > > > > Thanks, >> > > > > > > > > > Thomas >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, May 5, 2016 at 12:08 AM, Chaitanya Chebolu < >> > > > > > > > > > chaitanya@datatorrent.com> wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi Chandni, >> > > > > > > > > > > >> > > > > > > > > > > Its a good point. I created the hierarchy based on >> user >> > > > > > > perspective >> > > > > > > > > and >> > > > > > > > > > > especially for non Java users. If I return >> FileSplitter >> > and >> > > > > > > > BlockReader >> > > > > > > > > > > from FS Input Module, then this module works for NFS. >> > But, >> > > > for >> > > > > > > users >> > > > > > > > > > > perspective it would be difficult, whether this module >> > > works >> > > > > for >> > > > > > > NFS >> > > > > > > > or >> > > > > > > > > > any >> > > > > > > > > > > other fileSystem. >> > > > > > > > > > > >> > > > > > > > > > > Regards, >> > > > > > > > > > > Chaitanya >> > > > > > > > > > > >> > > > > > > > > > > On Thu, May 5, 2016 at 11:05 AM, Chandni Singh < >> > > > > > > > > chandni@datatorrent.com> >> > > > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > > I am sorry Chaitanya but I have more questions about >> > this >> > > > > > > > > > > > >> > > > > > > > > > > > 1. why is the FS Input Module abstract when by >> default >> > it >> > > > can >> > > > > > > > return >> > > > > > > > > > > > FileSplitter & BlockReader in >> > com.datatorrent.lib.io.fs? >> > > > > > > > > > > > These implementations are not specific to NFS. >> > > > > > > > > > > > >> > > > > > > > > > > > 2. In the NFS module that you have suggested to >> create, >> > > > what >> > > > > is >> > > > > > > > > > specific >> > > > > > > > > > > to >> > > > > > > > > > > > NFS? >> > > > > > > > > > > > >> > > > > > > > > > > > Please note: I have created a ticket >> APEXMALHAR-2081 to >> > > > > remove >> > > > > > > > > > > > FSFileSplitter from library and move its feature to >> the >> > > > base >> > > > > > > > > operator. >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks, >> > > > > > > > > > > > Chandni >> > > > > > > > > > > > >> > > > > > > > > > > > On Wed, May 4, 2016 at 10:29 PM, Chaitanya Chebolu < >> > > > > > > > > > > > chaitanya@datatorrent.com> wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > > FSFileSplitter & BlockReader are available in >> > > > > > > > > > com.datatorrent.lib.io.fs >> > > > > > > > > > > > > package. >> > > > > > > > > > > > > >> > > > > > > > > > > > > On Thu, May 5, 2016 at 10:47 AM, Chandni Singh < >> > > > > > > > > > > singh.chandni@gmail.com> >> > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > >> > > > > > > > > > > > > > Ok. What is specific about the fileSplitter and >> > > > > blockReader >> > > > > > > > > > returned >> > > > > > > > > > > by >> > > > > > > > > > > > > > this implementation? >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > On May 4, 2016 9:43 PM, "Chaitanya Chebolu" < >> > > > > > > > > > > chaitanya@datatorrent.com >> > > > > > > > > > > > > >> > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi Chandni, >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Properties wise nothing specific. FS Input >> Module >> > > is >> > > > an >> > > > > > > > > abstract >> > > > > > > > > > > > Module >> > > > > > > > > > > > > > and >> > > > > > > > > > > > > > > NFS Module implements the abstract methods - >> > > > > > > > > createFileSplitter() >> > > > > > > > > > > and >> > > > > > > > > > > > > > > createBlockReader(). >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > Chaitanya >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Wed, May 4, 2016 at 9:45 PM, Chandni Singh >> < >> > > > > > > > > > > > singh.chandni@gmail.com >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Chaitanya, >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > What will be specific in NFS Input Module >> that >> > is >> > > > not >> > > > > > > > > provided >> > > > > > > > > > by >> > > > > > > > > > > > FS >> > > > > > > > > > > > > > > Input >> > > > > > > > > > > > > > > > Module? >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks, >> > > > > > > > > > > > > > > > Chandni >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Wed, May 4, 2016 at 7:12 AM, Amol Kekre < >> > > > > > > > > > amol@datatorrent.com >> > > > > > > > > > > > >> > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > +1 >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thks >> > > > > > > > > > > > > > > > > Amol >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > On Tue, May 3, 2016 at 10:06 PM, Sandeep >> > > > Deshmukh < >> > > > > > > > > > > > > > > > sandeep@datatorrent.com >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > +1 >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > Sandeep >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Fri, Apr 29, 2016 at 3:26 PM, Mohit >> > > Jotwani >> > > > < >> > > > > > > > > > > > > > > mohit@datatorrent.com> >> > > > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > +1 >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > > Mohit >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > On Fri, Apr 29, 2016 at 2:09 PM, >> > Chaitanya >> > > > > > Chebolu >> > > > > > > < >> > > > > > > > > > > > > > > > > > > chaitanya@datatorrent.com> wrote: >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Hi All, >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > I am proposing NFS Input Module. >> Use >> > > case >> > > > > is >> > > > > > to >> > > > > > > > > read >> > > > > > > > > > > > large >> > > > > > > > > > > > > > > files >> > > > > > > > > > > > > > > > > from >> > > > > > > > > > > > > > > > > > > NFS >> > > > > > > > > > > > > > > > > > > > in parallel. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Design of NFS input module: >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > There is a common interface >> > > > > "FSInputModule" >> > > > > > in >> > > > > > > > > > Malhar >> > > > > > > > > > > > for >> > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > input >> > > > > > > > > > > > > > > > > > > > Modules. NFS input Module extends >> from >> > > > > > > > FSInputModule >> > > > > > > > > > and >> > > > > > > > > > > > can >> > > > > > > > > > > > > be >> > > > > > > > > > > > > > > > > > achieved >> > > > > > > > > > > > > > > > > > > by >> > > > > > > > > > > > > > > > > > > > using FSFileSplitter and BlockReader >> > > > > operators. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Please share your thoughts on >> this. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > > > Chaitanya >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > --94eb2c0bc490d6e68a053312b0f1--