spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jose-torres <>
Subject [GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...
Date Thu, 22 Feb 2018 21:56:34 GMT
Github user jose-torres commented on a diff in the pull request:
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
    @@ -415,12 +418,14 @@ class MicroBatchExecution(
                 case v1: SerializedOffset => reader.deserializeOffset(v1.json)
                 case v2: OffsetV2 => v2
    -          reader.setOffsetRange(
    -            toJava(current),
    -            Optional.of(availableV2))
    +          reader.setOffsetRange(toJava(current), Optional.of(availableV2))
               logDebug(s"Retrieving data from $reader: $current -> $availableV2")
    -          Some(reader ->
    -            new StreamingDataSourceV2Relation(reader.readSchema().toAttributes, reader))
    +          Some(reader -> StreamingDataSourceV2Relation(
    --- End diff --
    When it comes to the streaming execution code, the basic problem is that it was more evolved
than designed. For example, there's no particular reason to use a logical plan; the map is
only ever used in order to construct another map of source -> physical plan stats. Untangling
StreamExecution is definitely something we need to do, but that's going to be annoying and
I think it's sufficiently orthogonal to the V2 migration to put off.
    There's currently no design doc for the streaming aspects of DataSourceV2. We kinda rushed
an experimental version out the door, because it was coupled with the experimental ContinuousExecution
streaming mode. I'm working on going back and cleaning things up; I'll send docs to the dev
list and make sure to @ you on the changes.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message