flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lohith Samaga M <Lohith.Sam...@mphasis.com>
Subject RE: Where to put live model and business logic in Hadoop/Flink BigData system
Date Fri, 06 May 2016 11:22:12 GMT
HI Palle,
	I am a beginner in Flink.

	However, I can say something about your other questions:
	1. It is better to use Spark to create aggregate views. It is a lot faster than MR. You could
use either batch or streaming mode in spark based on your needs.
	2. If your aggregate data is in tabular format, you could store it in Hive.
	3. For your live model, you could use either spark streaming (micro batches) or use Storm
(process individual tuples). It is easy to put business logic in storm bolts to work on each
	4. But please take care of latency (and other issues) when accessing aggregate data from
live model. Your model should be able to handle latencies (from aggregate data access) and
not create a backlog of streaming data that may lead to Storm failing the tuple.

	Hope this helps.

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga
-----Original Message-----
From: palle@sport.dk [mailto:palle@sport.dk
Sent: Friday, May 06, 2016 16.23
To: user@flink.apache.org
Subject: Where to put live model and business logic in Hadoop/Flink BigData system

Hi there.

We are putting together some BigData components for handling a large amount of incoming data
from different log files and perform some analysis on the data.

All data being fed into the system will go into HDFS. We plan on using Logstash, Kafka and
Flink for bringing data from the log files and into HDFS. All our data located in HDFS we
will designate as our historic data and we will use MapReduce (probably Flink, but could also
be Hadoop) to create some aggregate views of the historic data. These views we will locate
probably in HBase or MongoDB.

These views of the historic data (also called batch views in the Lambda Architecture if any
of you are familiar with that) we will use from the live model in the system. The live model
is also being fed with the same data (through Kafka) and when the live model detects a certain
value in the incoming data, it will perform some analysis using the views in HBase/MongoDB
of the historic data.

Now, could anyone share some knowledge regarding where it would be possible to implement such
a live model given the components we plan on using? Apart from the business logic that will
perform the analysis, our live model will at all times also contain a java object structure
of maybe 5-10 java collections (maps, lists) containing approx 5 mio objects.

So, where is it possible to implement our live model? Can we do this in Flink? Can we do this
with another component within the Hadoop Big Data ecosystem?


Information transmitted by this e-mail is proprietary to Mphasis, its associated companies
and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may contain information
that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended recipient or it appears
that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination of this information
in any manner is strictly 
prohibited. In such cases, please notify us immediately at mailmaster@mphasis.com and delete
this mail from your records.
View raw message