Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85DA318C8F for ; Wed, 23 Mar 2016 13:48:36 +0000 (UTC) Received: (qmail 79899 invoked by uid 500); 23 Mar 2016 13:48:35 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 79810 invoked by uid 500); 23 Mar 2016 13:48:35 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 79781 invoked by uid 99); 23 Mar 2016 13:48:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2016 13:48:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 75225180219 for ; Wed, 23 Mar 2016 13:48:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.308 X-Spam-Level: * X-Spam-Status: No, score=1.308 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=teamaol-com.20150623.gappssmtp.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id kJY0xElnY6Vv for ; Wed, 23 Mar 2016 13:48:21 +0000 (UTC) Received: from mail-io0-f180.google.com (mail-io0-f180.google.com [209.85.223.180]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 2C2055F248 for ; Wed, 23 Mar 2016 13:48:21 +0000 (UTC) Received: by mail-io0-f180.google.com with SMTP id m184so40511435iof.1 for ; Wed, 23 Mar 2016 06:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=teamaol-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=b3ZsaKdo33dXRPhGTb5TdZvCYzc5HIRu+t7bs86ur5Y=; b=OgtxA8JJGdWgzqOeZcIB/p4pvqyh6GbQZmYqFSdwDJ2YP4yjGRMx26/7rkjskdz1L+ ho8/zcit3tb0MdnhozToXkNJjjZmvRsDcsuwlmd00sZPrA4XYp5EeMXCHATBPR9Dis+a 2IjKBXTlqrUu2C1hFj5D/TOCrlchcqmcDeGeaCkWpzDca2AXDxs8vvhAHPpdfj0xQ1O1 XW6rluGICVVs/4EU9Z5z7v6Uz/4pMOlctLG3eh7zZDNndQkGqa5Wx0o0/0o42EM5RgfZ J+onpgW2gVhFRPpyxQOTd/2hex7oEHEJMavDR6fVRO9rVRJOgBp0MkQipqXAmG4YtIMo ioRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=b3ZsaKdo33dXRPhGTb5TdZvCYzc5HIRu+t7bs86ur5Y=; b=O/IZC2FzTPJegZeUouFYcDA6mLGXFxCAPOonzYgJ/oVfUKKvDka9zdZwiIY5POxL0N luqSW1MlqHXqowPVET6SwY/PUXEqz5ugF6S9iM//huNIckMHSnhDFX2hE6OZmtCYD0eh BllEVdWPySTsNYN9Zo7XIAVNCt/MpTdMI434Y1mNMmubQkYvDtSgKAhLC0W5VFkpOCvw EWqGzdCcVQA+NhUPR1r8tnH7/tNOMGjtGhyKFX1Rk+TTRflmCFNaA5xO/Rm+gP8wNEx7 +UeppSBNeCU4wmX0j9v2q9xQpT5cVfRwLcgUgDkpv3Jw5TSkovp0ufDVdVodzAWQaeL5 JtiA== X-Gm-Message-State: AD7BkJJTD3heZbFGPLtkVzjYD3Y1Vkor/1eIr63GC5hHC3xllxP9nxMi1e/U2HhlRdb8BG/iGEu3uKgBdSqMJsl+ MIME-Version: 1.0 X-Received: by 10.50.43.170 with SMTP id x10mr24412969igl.47.1458740869979; Wed, 23 Mar 2016 06:47:49 -0700 (PDT) Received: by 10.107.140.151 with HTTP; Wed, 23 Mar 2016 06:47:49 -0700 (PDT) In-Reply-To: References: Date: Wed, 23 Mar 2016 09:47:49 -0400 Message-ID: Subject: Re: Flink 1.0.0 reading files from multiple directory with wildcards From: Sourigna Phetsarath To: user@flink.apache.org Content-Type: multipart/alternative; boundary=089e01176cad6016e5052eb793ef --089e01176cad6016e5052eb793ef Content-Type: text/plain; charset=UTF-8 Great! I will, once I clear it with the legal team here. On Wed, Mar 23, 2016 at 6:19 AM, Ufuk Celebi wrote: > Nice! Would you like to contribute this to Flink via a pull request? Some > resources about the contribution process can be found here: > > http://flink.apache.org/contribute-code.html > http://flink.apache.org/how-to-contribute.html > > On Wed, Mar 23, 2016 at 12:00 AM, Fabian Hueske wrote: > >> Hi Gna, >> >> thanks for sharing the good news and opening the JIRA! >> >> Cheers, Fabian >> >> 2016-03-22 23:30 GMT+01:00 Sourigna Phetsarath < >> gna.phetsarath@teamaol.com>: >> >>> Ufek & Fabian, >>> >>> FYI, I was about to extend the FileInputFormat and extend the createInputSplits >>> to handle multiple Path - there was an improvement of reduced resource >>> usage and increased performance of the job. >>> >>> Also added this ticket: https://issues.apache.org/jira/browse/FLINK-3655 >>> >>> -Gna >>> >>> On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath < >>> gna.phetsarath@teamaol.com> wrote: >>> >>>> Fabian, >>>> >>>> I'll try extending InputFormat as you suggested and will create a JIRA >>>> issue as well. >>>> >>>> I also have an AvroGenericRecordInput format class that I would like to >>>> contribute once I have time to clean it up and get it into your code base. >>>> >>>> -Gna >>>> >>>> On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> no, this is currently not supported. However, I agree this would be a >>>>> very valuable addition to the FileInputFormat. >>>>> Would you mind opening a JIRA issue with your suggestions? >>>>> >>>>> Until this is added to Flink, it can be implemented as a custom >>>>> InputFormat based on FileInputFormat by overriding the createInputSplits() >>>>> method. >>>>> >>>>> Best, Fabian >>>>> >>>>> 2016-03-21 0:11 GMT+01:00 Sourigna Phetsarath < >>>>> gna.phetsarath@teamaol.com>: >>>>> >>>>>> All, >>>>>> >>>>>> Do any of the Flink Data Sources support comma separated directories >>>>>> with wildcards? >>>>>> >>>>>> For example: >>>>>> >>>>>> env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*, >>>>>> /data/2016/01/03/*/*") >>>>>> >>>>>> >>>>>> Thanks in advance for any help that you can provide. >>>>>> -- >>>>>> >>>>>> >>>>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services >>>>>> // Applied Research Chapter >>>>>> 770 Broadway, 5th Floor, New York, NY 10003 >>>>>> o: 212.402.4871 // m: 917.373.7363 >>>>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna >>>>>> >>>>>> * * >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services // >>>> Applied Research Chapter >>>> 770 Broadway, 5th Floor, New York, NY 10003 >>>> o: 212.402.4871 // m: 917.373.7363 >>>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna >>>> >>>> * * >>>> >>> >>> >>> >>> -- >>> >>> >>> *Gna Phetsarath*System Architect // AOL Platforms // Data Services // >>> Applied Research Chapter >>> 770 Broadway, 5th Floor, New York, NY 10003 >>> o: 212.402.4871 // m: 917.373.7363 >>> vvmr: 8890237 aim: sphetsarath20 t: @sourigna >>> >>> * * >>> >> >> > -- *Gna Phetsarath*System Architect // AOL Platforms // Data Services // Applied Research Chapter 770 Broadway, 5th Floor, New York, NY 10003 o: 212.402.4871 // m: 917.373.7363 vvmr: 8890237 aim: sphetsarath20 t: @sourigna * * --089e01176cad6016e5052eb793ef Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Great!=C2=A0 I will, once I clear it with the legal team h= ere.

On Wed,= Mar 23, 2016 at 6:19 AM, Ufuk Celebi <uce@apache.org> wrote:
Nice! Would you like to c= ontribute this to Flink via a pull request? Some resources about the contri= bution process can be found here:

<= div class=3D"gmail_extra">
On Wed, Mar 23, 20= 16 at 12:00 AM, Fabian Hueske <fhueske@gmail.com> wrote:
=
Hi Gna,

=
thanks for sharing the good news and opening the JIRA!

Cheers, Fabian

2016-03-22 23:30 GMT+01:00 Sourigna Phet= sarath <gna.phetsarath@teamaol.com>:
Ufek & Fabian,

= FYI, =C2=A0I was about to extend the=C2=A0= FileInputFormat and extend the=C2=A0createInputSplits to handle multiple Path - there was an improvement of re= duced resource usage and increased performance of the job.


= On Mon, Mar 21, 2016 at 10:04 AM, Sourigna Phetsarath <= ;gna.phetsa= rath@teamaol.com> wrote:
Fabian,

I'll try extending Inpu= tFormat as you suggested and will create a JIRA issue as well.

I also have an AvroGenericRecordInput format class that I would like= to contribute once I have time to clean it up and get it into your code ba= se.

-Gna

On Mon, Mar 21, 2016 at 6:35 AM, Fabi= an Hueske <fhueske@gmail.com> wrote:
Hi,

no, this is cur= rently not supported. However, I agree this would be a very valuable additi= on to the FileInputFormat.
Would you mind opening a JIRA issue with you= r suggestions?

Until this is added to Flink, it can be impleme= nted as a custom InputFormat based on FileInputFormat by overriding the cre= ateInputSplits() method.

Best, Fabian

2016-03-21 0:11 GMT+01= :00 Sourigna Phetsarath <gna.phetsarath@teamaol.com>:
All,

Do any of the Flink Data Sources support comma separated directories wit= h wildcards?

For example:
env.readFile("/d= ata/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")

Thanks in advanc= e for any help that you can provide.
-- =

Gna Phetsarath
System Architect // AOL Platforms // Da= ta Services // Applied Research Chapter
770 Broadway, 5th Floor, New Yor= k, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr:=C2= =A08890237=C2=A0
aim: = sphetsarath20 t: @sourigna

<= /a>





--
=



--
=

Gna Phetsarath
<= font size=3D"2">System Architect // AOL P= latforms // Data Services // Applied Research Chapter
770 Broadway, 5th = Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr:=C2=A088902= 37=C2=A0aim: sphetsarath20 t: @sourigna<= /p>

=






--
=
=

Gna Phetsarath
<= font size=3D"2">System Architect // AOL P= latforms // Data Services // Applied Research Chapter
770 Broadway, 5th = Floor, New York, NY 10003
o: 212.402.4871 // m: 917.373.7363
vvmr:=C2=A088902= 37=C2=A0aim: sphetsarath20 t: @sourigna<= /p>


--089e01176cad6016e5052eb793ef--