Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 20B21200BF7 for ; Mon, 9 Jan 2017 15:04:22 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1F466160B3E; Mon, 9 Jan 2017 14:04:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1E9FA160B3B for ; Mon, 9 Jan 2017 15:04:20 +0100 (CET) Received: (qmail 7579 invoked by uid 500); 9 Jan 2017 14:04:20 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 7568 invoked by uid 99); 9 Jan 2017 14:04:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2017 14:04:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BF01B1A8B79 for ; Mon, 9 Jan 2017 14:04:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.98 X-Spam-Level: * X-Spam-Status: No, score=1.98 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=data-artisans-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id L0CnggmMI_Ob for ; Mon, 9 Jan 2017 14:04:18 +0000 (UTC) Received: from mail-wj0-f182.google.com (mail-wj0-f182.google.com [209.85.210.182]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 77A865F4AE for ; Mon, 9 Jan 2017 14:04:17 +0000 (UTC) Received: by mail-wj0-f182.google.com with SMTP id ew7so52054513wjc.3 for ; Mon, 09 Jan 2017 06:04:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=data-artisans-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=4KMa2Kyn5noEXu57d3nS77Sfmxnce2lZR4I9ramBqYg=; b=BdZgXiPQDCqI97MFCTbjLxWBC164cuWYMtN8EuXBQkNygfivUoc/c5xldYXeXwz8GM Q9yflQmlrq/JK3q14/HJ3v1ldBGZ9qobXiZb7lWfk0cvySQBfEKlej3IhUoCNSDkDWsl j7aWwGNsqBbundZxj2cF/dKDnS7yCb6xIwvCJDMlXy29RWmdqxvfTy3+5eVdM0ocmnzk iNCDFisiJkDAOLItk4GSpBvg5Ynz+GYH3uQaaVA9fTsa1UwX8LcbHCOh7PBhAwEoj1hn Cwxu5bsKcIkA7XrA9oYZDVM+d0+a4zQCAscCbTUnTEfrnJLezox3jBJyPygW11jbMJlT 3vrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=4KMa2Kyn5noEXu57d3nS77Sfmxnce2lZR4I9ramBqYg=; b=O8SvD4CjrZwAmsJppTB+BOCj0IM+U8vQMf+iPhCN/V/f1O5lKu62RYMODP+3XgToJX RVKTPM5Pg8/nuw1TK21WI3b3O7im8z2pWUpW1UWgaGlcCkpQQuI1pfMAkKODzkz4qqNC VfGn9eePbP3T44GF+Zb0RUQGpjBqKiqZEe4bQ/lu+iJt6PQf/0xShUCcgHMmUMq93TEk 6aMrwQFCn8D1soCifYMCYuuserFjBnJEzwd0h69LfWd4oDbqj5hLOgB29AMdFHHP+I3c r6JcIMyMovzgPvvHyFnnrShMTB3HC9SplUtnwaNBUcb3TjGSfXpGZfzGKA6ykWsuo+JW qYrA== X-Gm-Message-State: AIkVDXJhaV/BbaVNCguI/Fjxn/czuveEe+ovOH25sH8Ds9q/jTE/r46GwrzpSbFOw2qKLktq X-Received: by 10.194.173.228 with SMTP id bn4mr63553822wjc.161.1483970648421; Mon, 09 Jan 2017 06:04:08 -0800 (PST) Received: from [192.168.178.65] (dslb-088-072-229-097.088.072.pools.vodafone-ip.de. [88.72.229.97]) by smtp.gmail.com with ESMTPSA id l6sm19094499wmd.5.2017.01.09.06.04.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 09 Jan 2017 06:04:07 -0800 (PST) From: Kostas Kloudas Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_812534D9-4ED4-457D-B38C-29A9A9063F17" Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: Continuous File monitoring not reading nested files Date: Mon, 9 Jan 2017 15:04:06 +0100 In-Reply-To: Cc: user@flink.apache.org To: Yassine MARZOUGUI References: X-Mailer: Apple Mail (2.3259) archived-at: Mon, 09 Jan 2017 14:04:22 -0000 --Apple-Mail=_812534D9-4ED4-457D-B38C-29A9A9063F17 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi Yassine, I suspect that the problem is in the way the input format (and not the = reader) scans nested files,=20 but could you see if in the code that is executed by the tasks, the = nestedFileEnumeration parameter is still true? I am asking in order to pin down if the problem is in the way we ship = the code to the tasks or in reading the=20 nested files. Thanks, Kostas > On Jan 9, 2017, at 12:56 PM, Yassine MARZOUGUI = wrote: >=20 > Hi, >=20 > Any updates on this issue? Thank you. >=20 > Best, > Yassine >=20 >=20 > On Dec 20, 2016 6:15 PM, "Aljoscha Krettek" > wrote: > +kostas, who probably has the most experience with this by now. Do you = have an idea what might be going on? >=20 > On Fri, 16 Dec 2016 at 15:45 Yassine MARZOUGUI = > wrote: > Looks like this is not specific to the continuous file monitoring, I'm = having the same issue (files in nested directories are not read) when = using: >=20 > env.readFile(fileInputFormat, "hdfs:///shared/mydir", = FileProcessingMode.PROCESS_ONCE, -1L) >=20 > 2016-12-16 11:12 GMT+01:00 Yassine MARZOUGUI = >: > Hi all, >=20 > I'm using the following code to continuously process files from a = directory "mydir". >=20 > final StreamExecutionEnvironment env =3D = StreamExecutionEnvironment.getExecutionEnvironment(); >=20 > FileInputFormat fileInputFormat =3D new TextInputFormat(new = Path("hdfs:///shared/mydir")); > fileInputFormat.setNestedFileEnumeration(true); >=20 > env.readFile(fileInputFormat, > "hdfs:///shared/mydir", > FileProcessingMode.PROCESS_CONTINUOUSLY, 10000L) > .print(); >=20 > env.execute(); >=20 > If I add directory under mydir, say "2016-12-16", and then add a file = "2016-12-16/file.txt", its contents are not printed. If I add the same = file directly under "mydir", its contents are correctly printed. After = that the logs will show the following : >=20 > 10:55:44,928 DEBUG = org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFu= nction - Ignoring hdfs://mlxbackoffice/shared/mydir/2016-12-16, with = mod time=3D 1481882041587 and global mod time=3D 1481882126122 > 10:55:44,928 DEBUG = org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFu= nction - Ignoring hdfs://mlxbackoffice/shared/mydir/file.txt, with mod = time=3D 1481881788704 and global mod time=3D 1481882126122 >=20 > Looks like the ContinuousFileMonitoringFunction considered it already = read 2016-12-16 as a file and then excludes it, but its contents were = not processed. Any Idea why this happens? > Thank you. >=20 > Best, > Yassine >=20 >=20 --Apple-Mail=_812534D9-4ED4-457D-B38C-29A9A9063F17 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Hi Yassine,

I suspect that the problem is in the way the input format = (and not the reader) scans nested files, 
but = could you see if in the code that is executed by the tasks, the = nestedFileEnumeration parameter is still true?

I am asking in order to pin down if the = problem is in the way we ship the code to the tasks or in reading = the 
nested files.

Thanks,
Kostas

On Jan 9, 2017, at 12:56 PM, Yassine MARZOUGUI <y.marzougui@mindlytix.com> wrote:

Hi,

Any updates on this issue? = Thank you.

Best,
Yassine


On Dec 20, 2016 6:15 PM, "Aljoscha = Krettek" <aljoscha@apache.org> wrote:
+kostas, who probably has the most experience with this by = now. Do you have an idea what might be going on?

On Fri, 16 Dec 2016 at = 15:45 Yassine MARZOUGUI <y.marzougui@mindlytix.com> wrote:
Looks like this is not specific = to the continuous file monitoring, I'm having the same issue (files in = nested directories are not read) when using:

env.readFile(fileInputFormat, = ;"hdfs:///shared/mydir", FileProcessingMode.PROCESS_ONCE, -1L)

2016-12-16 11:12 GMT+01:00 Yassine = MARZOUGUI <y.marzougui@mindlytix.com>:
Hi all,

I'm using the following code to = continuously process files from a directory "mydir".

final = StreamExecutionEnvironment env =3D StreamExecutionEnvironment.getExecutionEnvironment();

FileInputFormat = fileInputFormat =3D new TextInputFormat(new Path("hdfs:///shared/mydir"));
fileInputFormat.setNestedFileEnumeration(true);

env.readFile(fileInputFormat,
        =         "hdfs:///shared/mydir",
      =           FileProcessingMode.PROCESS_CONTINUOUSLY, 10000L)
      =           .print();

env.execute();

If I add directory = under mydir, say "2016-12-16", and then add a file "2016-12-16/file.txt", its = contents are not printed. If I add the same file directly under "mydir",  its contents = are correctly printed. After that the logs will show the following = :

10:55:44,928 DEBUG = org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction  - = Ignoring hdfs://mlxbackoffice/shared/mydir/2016-12-16, with mod time=3D 1481882041587 and global = mod time=3D 1481882126122
10:55:44,928 DEBUG = org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction  - = Ignoring hdfs://mlxbackoffice/shared/mydir/file.txt,= with mod time=3D 1481881788704 and global mod time=3D = 1481882126122

Looks like = the ContinuousFileMonitoringFunction =  considered it already read 2016-12-16 as a file and then = excludes it, but its contents were not processed. Any Idea why this = happens?
Thank you.

Best,
Yassine



= --Apple-Mail=_812534D9-4ED4-457D-B38C-29A9A9063F17--