spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jose-torres <...@git.apache.org>
Subject [GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.
Date Tue, 06 Mar 2018 01:35:46 GMT
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/20710
  
    As you say, there's no strict semantic need to have createDataWriter() take arguments.
We could simply have each DataWriter identify itself by a random UUID, and require upstream
components to keep track of which UUIDs map to which of the writers they care about. But the
current API design is to enable each data writer to identify its logical place in the query,
and epoch ID is an important part of that. (I expect it would be infeasible to migrate existing
sources to an API which didn't provide things like partition ID or attempt number.)
    
    StreamWriter is the separate streaming interface, and DataWriterFactory implementations
in streaming queries will always come from a StreamWriter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message