spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Spark checkpoint restore failure due to s3 consistency issue
Date Fri, 09 Oct 2015 21:47:26 GMT
That wont really. What we need to see is the lifecycle of the file before
the failure, so we need to the log4j logs.

On Fri, Oct 9, 2015 at 2:34 PM, Spark Newbie <sparknewbie1234@gmail.com>
wrote:

> Unfortunately I don't have the before stop logs anymore since the log was
> overwritten in my next run.
>
> I created a rdd-<xxx>_$folder$ file in S3 which was missing compared to
> the other rdd-<yyy> checkpointed. The app started without the
> IllegalArgumentException. Do you still need to after restart log4j logs? I
> can send it if that will help dig into the root cause.
>
> On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das <tdas@databricks.com> wrote:
>
>> Can you provide the before stop and after restart log4j logs for this?
>>
>> On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie <sparknewbie1234@gmail.com>
>> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I'm seeing checkpoint restore failures causing the application startup
>>> to fail with the below exception. When I do "ls" on the s3 path I see the
>>> key listed sometimes and not listed sometimes. There are no part files
>>> (checkpointed files) in the specified S3 path. This is possible because I
>>> killed the app and restarted as a part of my testing to see if kinesis-asl
>>> library's implementation of lossless kinesis receivers work.
>>>
>>> Has anyone seen the below exception before? If so is there a recommended
>>> way to handle this case?
>>>
>>> 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find
>>> key '<insert s3 key to the checkpointed rdd>'
>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>> requirement failed: Checkpoint directory does not exist: <insert full s3
>>> path to the checkpointed rdd>
>>>         at scala.Predef$.require(Predef.scala:233)
>>>         at
>>> org.apache.spark.rdd.ReliableCheckpointRDD.<init>(ReliableCheckpointRDD.scala:45)
>>>         at
>>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
>>>         at
>>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
>>>         at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>>         at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>>         at
>>> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>>>         at
>>> org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217)
>>>         at
>>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112)
>>>         at
>>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109)
>>>         at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>>         at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>>         at
>>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>>>         at
>>> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>>>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>>>         at
>>> org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>>         at scala.collection.immutable.List.foreach(List.scala:318)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>>         at scala.collection.immutable.List.foreach(List.scala:318)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>>         at scala.collection.immutable.List.foreach(List.scala:318)
>>>         at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>>>         at
>>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
>>>         at
>>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
>>>         at
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>         at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>         at
>>> org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153)
>>>         at
>>> org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:158)
>>>         at
>>> org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
>>>         at
>>> org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
>>>         at scala.Option.map(Option.scala:145)
>>>         at
>>> org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:837)
>>>         at foo$.getStreamingContext(foo.scala:72)
>>>
>>> Thanks,
>>> Bharath
>>>
>>
>>
>

Mime
View raw message