Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 465B0200BFA for ; Thu, 12 Jan 2017 13:01:26 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 44FCD160B40; Thu, 12 Jan 2017 12:01:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B48B0160B2D for ; Thu, 12 Jan 2017 13:01:25 +0100 (CET) Received: (qmail 75077 invoked by uid 500); 12 Jan 2017 12:01:25 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 75066 invoked by uid 99); 12 Jan 2017 12:01:24 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jan 2017 12:01:24 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 953EBDFA22; Thu, 12 Jan 2017 12:01:24 +0000 (UTC) From: cloud-fan To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f... Content-Type: text/plain Message-Id: <20170112120124.953EBDFA22@git1-us-west.apache.org> Date: Thu, 12 Jan 2017 12:01:24 +0000 (UTC) archived-at: Thu, 12 Jan 2017 12:01:26 -0000 Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16474#discussion_r95777107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -135,11 +135,21 @@ class FileScanRDD( try { if (ignoreCorruptFiles) { currentIterator = new NextIterator[Object] { - private val internalIter = readFunction(currentFile) + private val internalIter = { + try { + // The readFunction may read files before consuming the iterator. + // E.g., vectorized Parquet reader. + readFunction(currentFile) + } catch { + case e @(_: RuntimeException | _: IOException) => + logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e) + null --- End diff -- return `Iterator.empty` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org