Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D4C4C200BB6 for ; Thu, 20 Oct 2016 22:57:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D3A78160ACC; Thu, 20 Oct 2016 20:57:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 30A7A160AE0 for ; Thu, 20 Oct 2016 22:57:13 +0200 (CEST) Received: (qmail 76476 invoked by uid 500); 20 Oct 2016 20:57:12 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 76412 invoked by uid 99); 20 Oct 2016 20:57:12 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2016 20:57:12 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 000B6DFAED; Thu, 20 Oct 2016 20:57:11 +0000 (UTC) From: vanzin To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #15556: [SPARK-18010][Core] Reduce work performed for bui... Content-Type: text/plain Message-Id: <20161020205712.000B6DFAED@git1-us-west.apache.org> Date: Thu, 20 Oct 2016 20:57:11 +0000 (UTC) archived-at: Thu, 20 Oct 2016 20:57:14 -0000 Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15556#discussion_r84373054 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -43,38 +43,56 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { * @param sourceName Filename (or other source identifier) from whence @logData is being read * @param maybeTruncated Indicate whether log file might be truncated (some abnormal situations * encountered, log file might not finished writing) or not + * @param eventsFilter Filter function to select JSON event strings in the log data stream that + * should be parsed and replayed. When not specified, all event strings in the log data + * are parsed and replayed. */ def replay( logData: InputStream, sourceName: String, - maybeTruncated: Boolean = false): Unit = { - var currentLine: String = null - var lineNumber: Int = 1 + maybeTruncated: Boolean = false, + eventsFilter: (String) => Boolean = ReplayListenerBus.SELECT_ALL_FILTER): Unit = { try { - val lines = Source.fromInputStream(logData).getLines() - while (lines.hasNext) { - currentLine = lines.next() + val lineEntries = Source.fromInputStream(logData) + .getLines() + .zipWithIndex + .filter(entry => eventsFilter(entry._1)) + + var entry: (String, Int) = ("", 0) + + while (lineEntries.hasNext) { try { - postToAll(JsonProtocol.sparkEventFromJson(parse(currentLine))) + entry = lineEntries.next() --- End diff -- nit: `val (line, lineno) = lineEntries.next()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org