incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Injecting stream from mysql db
Date Mon, 23 Apr 2012 10:56:32 GMT
On 4/22/12 4:40 PM, Samir Madhavan wrote:
> Hi,
>
> If there is a mysql db where the log data of a website is getting
> recorded then how does one inject the data to S4 module from the mysql
> db in real time.
>
If you have lots of data and high frequency, you probably don't want to 
pull from the database. So a "tee" architecture (as opposed to a serial 
pipeline) would be suitable: website logs should go both to the database 
and to the stream processing system.

This way you are able to process logs in realtime in S4 (for computing 
clickthrough rates etc..) and you also keep all the logs for deeper mining.

One project you might want to have a look at for the "tee" 
implementation is Apache Kafka, it may help you or give you some ideas.


Hope this helps,

Matthieu

Mime
View raw message