spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] jhsb25 opened a new pull request #25916: [SPARK-29228][SS] Adding the ability to add headers to the KafkaRecords published to Kafka from spark structured streaming
Date Tue, 24 Sep 2019 12:40:25 GMT
jhsb25 opened a new pull request #25916: [SPARK-29228][SS] Adding the ability to add headers
to the KafkaRecords published to Kafka from spark structured streaming
URL: https://github.com/apache/spark/pull/25916
 
 
   ### What changes were proposed in this pull request?
   The changes in the pull request allow a developer to provide a column within their structured
streaming dataframe titled `header` and of type `map<string, string>` which when the
dataframe is sinked to kafka the headers will be applied to the producer record. This can
have many applications but the main reason we wanted this introduced was for tracing purposes.
   
   ### Why are the changes needed?
   Sometimes it is necessary to add custom headers to the kafka producer record but with the
current spark-kafka sink this is not possible, these changes make it optional to add headers
to the kafka records that will be produced.
   
   
   ### Does this PR introduce any user-facing change?
   If the user wants to apply headers in the records that will be published to kafka they
can provide a new column of type `map<string, string>` with the title `header`:
   
   `df.selectExpr("topic", "CAST(key AS STRING)", "CAST(value AS STRING)", "map('custom-header-1',
'foo', 'custom-header-2', 'bar') as headers")
     .write
     .format("kafka")
     .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
     .save()`
   
   
   ### How was this patch tested?
   Unit tests have been added and it has been tested thoroughly on a cluster. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message