Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDD11172B9 for ; Fri, 26 Jun 2015 10:22:04 +0000 (UTC) Received: (qmail 14935 invoked by uid 500); 26 Jun 2015 10:22:04 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 14860 invoked by uid 500); 26 Jun 2015 10:22:04 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 14850 invoked by uid 99); 26 Jun 2015 10:22:04 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Jun 2015 10:22:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 56855181796 for ; Fri, 26 Jun 2015 10:22:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.981 X-Spam-Level: ** X-Spam-Status: No, score=2.981 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id tyzb82L7F39Y for ; Fri, 26 Jun 2015 10:21:58 +0000 (UTC) Received: from mail-vn0-f49.google.com (mail-vn0-f49.google.com [209.85.216.49]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0038A45CC3 for ; Fri, 26 Jun 2015 10:21:58 +0000 (UTC) Received: by vnbg1 with SMTP id g1so14952363vnb.3 for ; Fri, 26 Jun 2015 03:21:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=8fapXkSd/1Yu5uLoPFoCd3JdPiHquurwy+cWIdor8Hs=; b=DGiuZWB5mCdYR7DuxkZpBzTw1o/F41PP+gDGWbpbJ+jyBR5RJOSSYaz0CCZ5Y3jDLG UURzjdranS2PanvXsG5sUJmXW+DCv7UD6e7cgj2c8ler8ZuuoiJp0ogwZZa93W8o2jf1 lGbfbElQ3ziHZWHZQKlsnMF2EDgfaVzDXujbIVBWKHOUchD9MMom3M3Txj2xbbv7LZlq rZ6ySEqfpQHvtyyi/nWgYLnSsazz+QNPM3EUokEfTVypcobBu6APTfCWie4x3TYFkpkY YgXvrtArN8x6wxXVwtLRLQG3Wf91ZypN3mNliM1wFI/0tTAkqbCctFx/ZYEPnKkpNLvI P7mA== MIME-Version: 1.0 X-Received: by 10.52.113.97 with SMTP id ix1mr670548vdb.1.1435314117795; Fri, 26 Jun 2015 03:21:57 -0700 (PDT) Sender: ewenstephan@gmail.com Received: by 10.31.164.210 with HTTP; Fri, 26 Jun 2015 03:21:57 -0700 (PDT) In-Reply-To: <1DC68751-2356-4766-9347-DCDBB982A9D0@mail.polimi.it> References: <9962C856-717B-46C1-B35B-02ABD2FD8361@mail.polimi.it> <1DC68751-2356-4766-9347-DCDBB982A9D0@mail.polimi.it> Date: Fri, 26 Jun 2015 12:21:57 +0200 X-Google-Sender-Auth: gkuB-XNOkz5vKCANnU7Ar1BVneM Message-ID: Subject: Re: open multiple file from list of uri From: Stephan Ewen To: user@flink.apache.org Content-Type: multipart/alternative; boundary=bcaec548a62721e6600519691c52 --bcaec548a62721e6600519691c52 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sure, just override the "createInputSplits()" method. Call for each of your file paths "super.createInputSplits()" and then combine the results into one array that you return. That should do it... On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni < michele1.bertoni@mail.polimi.it> wrote: > Hi Stephan, thanks for answering, > right now I am using an extension of the DelimitedInputFormat, is there a > way to merge it with the option 2? > > > > Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen > ha scritto: > > There are two ways you can realize that: > > 1) Create multiple sources and union them. This is easy, but probably a > bit less efficient. > > 2) Override the FileInputFormat's createInputSplits method to take a > union of the paths to create a list of all files and fils splits that wil= l > be read. > > Stephan > > > On Fri, Jun 26, 2015 at 12:12 PM, Michele Bertoni < > michele1.bertoni@mail.polimi.it> wrote: > >> Hi everybody, >> is there a way to specify a list of URI (=E2=80=9Chdfs://file1=E2=80=9D,= =E2=80=9Dhdfs://file2=E2=80=9D,=E2=80=A6) >> and open them as different files? >> I know i may open the entire directory, but i want to be able to select = a >> subset of files in the directory >> >> thanks > > > > --bcaec548a62721e6600519691c52 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Sure, just override the "createInputSplits()" me= thod. Call for each of your file paths "super.createInputSplits()"= ; and then combine the results into one array that you return.

That should do it...

On Fri, Jun 26, 2015 at 12:19 PM, Michele Bertoni <michele1.bertoni@mail.polimi.it> wrote:
Hi Stephan, thanks for answering,
right now I am using an extension of the DelimitedInputFormat, is ther= e a way to merge it with the option 2?



Il giorno 26/giu/2015, alle ore 12:17, Stephan Ewen <sewen@apache.org> ha scritto= :

There are two ways you can realize that:

1) Create multiple sources and union them. This is easy, but probably = a bit less efficient.

2) Override the FileInputFormat's createInputSplits method to take= a union of the paths to create a list of all files and fils splits that wi= ll be read.

Stephan


On Fri, Jun 26, 2015 at 12:12 PM, Michele Berton= i <mi= chele1.bertoni@mail.polimi.it> wrote:
Hi everybody,
is there a way to specify a list of URI (=E2=80=9Chdfs://file1=E2=80=9D,= =E2=80=9Dhdfs://file2=E2=80=9D,=E2=80=A6) and open them as different fi= les?
I know i may open the entire directory, but i want to be able to select a s= ubset of files in the directory

thanks



--bcaec548a62721e6600519691c52--