Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A8910200C1E for ; Fri, 17 Feb 2017 10:44:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A7338160B55; Fri, 17 Feb 2017 09:44:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EE641160B3F for ; Fri, 17 Feb 2017 10:44:46 +0100 (CET) Received: (qmail 10517 invoked by uid 500); 17 Feb 2017 09:44:46 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 10505 invoked by uid 99); 17 Feb 2017 09:44:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Feb 2017 09:44:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A56EFC264B for ; Fri, 17 Feb 2017 09:44:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id rgoCoBGFb4eO for ; Fri, 17 Feb 2017 09:44:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 27C155F4EE for ; Fri, 17 Feb 2017 09:44:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1D1AAE039F for ; Fri, 17 Feb 2017 09:44:43 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1070F2411F for ; Fri, 17 Feb 2017 09:44:42 +0000 (UTC) Date: Fri, 17 Feb 2017 09:44:42 +0000 (UTC) From: "Sean Owen (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-19645) structured streaming job restart bug MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 17 Feb 2017 09:44:47 -0000 [ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871542#comment-15871542 ] Sean Owen commented on SPARK-19645: ----------------------------------- Note that master requires Hadoop 2.6+ now. > structured streaming job restart bug > ------------------------------------ > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.1.0 > Reporter: guifeng > Priority: Critical > > We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} > Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file(latest delta file generated before restart and named "currentBatchId.delta") . In my opinion, this is a bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org