spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject spark kafka batch integration
Date Sun, 14 Dec 2014 20:41:58 GMT
hello all,
we at tresata wrote a library to provide for batch integration between
spark and kafka (distributed write of rdd to kafa, distributed read of rdd
from kafka). our main use cases are (in lambda architecture jargon):
* period appends to the immutable master dataset on hdfs from kafka using
spark
* make non-streaming data available in kafka with periodic data drops from
hdfs using spark. this is to facilitate merging the speed and batch layer
in spark-streaming
* distributed writes from spark-streaming

see here:
https://github.com/tresata/spark-kafka

best,
koert

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message