Return-Path: X-Original-To: apmail-spark-reviews-archive@minotaur.apache.org Delivered-To: apmail-spark-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7A4DCD12 for ; Mon, 24 Nov 2014 05:36:33 +0000 (UTC) Received: (qmail 18807 invoked by uid 500); 24 Nov 2014 05:36:33 -0000 Delivered-To: apmail-spark-reviews-archive@spark.apache.org Received: (qmail 18783 invoked by uid 500); 24 Nov 2014 05:36:33 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 18772 invoked by uid 99); 24 Nov 2014 05:36:33 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Nov 2014 05:36:33 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id EFD71A15ED6; Mon, 24 Nov 2014 05:36:32 +0000 (UTC) From: pwendell To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request: [SPARK-4518][SPARK-4519][Streaming] Refactored... Content-Type: text/plain Message-Id: <20141124053632.EFD71A15ED6@tyr.zones.apache.org> Date: Mon, 24 Nov 2014 05:36:32 +0000 (UTC) Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3419#discussion_r20773558 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala --- @@ -193,57 +306,25 @@ class FileInputDStream[K: ClassTag, V: ClassTag, F <: NewInputFormat[K,V] : Clas hadoopFiles.map(p => (p._1, p._2.mkString(", "))).mkString("\n") + "\n]" } } +} + +private[streaming] +object FileInputDStream { /** - * Custom PathFilter class to find new files that - * ... have modification time more than ignore time - * ... have not been seen in the last interval - * ... have modification time less than maxModTime + * Minimum duration of remembering the information of selected files. Files with mod times + * older than this "window" of remembering will be ignored. So if new files are visible + * within this window, then the file will get selected in the next batch. */ - private[streaming] - class CustomPathFilter(maxModTime: Long) extends PathFilter { + private val REMEMBER_DURATION = Minutes(1) --- End diff -- should this be called `MINIMUM_REMEMBER_DURATION`? There is also a variable called `durationToRemember` that might be different than this. I found it a bit confusing to have something called `durationToRemember` and another thing called `REMEMBER_DURATION` but they can have different values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org