hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagan Brahmi <gaganbra...@gmail.com>
Subject Re: Kafka or Flume
Date Sat, 01 Jul 2017 16:16:31 GMT
I'd say the data flow should be simpler since you might need some basic
verification of the data. You may want to include NiFi in the mix which
should do the job.

It can look something like this:

For ingestion

NiFi -> Kafka

For data verification

Kafka -> NiFi -> HDFS/Hive/HBase


Regards,
Gagan Brahmi

On Sat, Jul 1, 2017 at 7:26 AM, Sidharth Kumar <sidharthkumar2707@gmail.com>
wrote:

> Thanks for your suggestions. I feel kafka will be better but need some
> extra like either kafka with flume or kafka with spark streaming. Can you
> kindly suggest which will be better and in which situation which
> combination will perform best.
>
> Thanks in advance for your help.
>
> Warm Regards
>
> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192 367
> |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>
>
>
>
>
>
> On 30-Jun-2017 11:18 AM, "daemeon reiydelle" <daemeonr@gmail.com> wrote:
>
>> For fairly simple transformations, Flume is great, and works fine
>> subscribing
>> ​to some pretty ​
>> high volumes of messages from Kafka
>> ​ (I think we hit 50M/second at one point)​
>> . If you need to do complex transformations, e.g. database lookups for
>> the Kafka to Hadoop ETL, then you will start having complexity issues which
>> will exceed the capability of Flume.
>> ​There are git repos that have everything you need, which include the
>> kafka adapter, hdfs writer, etc. A lot of this is built into flume. ​
>> I assume this might be a bit off topic, so googling flume & kafka will
>> help you?
>>
>> On Thu, Jun 29, 2017 at 10:14 PM, Mallanagouda Patil <
>> mallanagouda.c.patil@gmail.com> wrote:
>>
>>> Kafka is capable of processing billions of events per second. You can
>>> scale it horizontally with Kafka broker servers.
>>>
>>> You can try out these steps
>>>
>>> 1. Create a topic in Kafka to get your all data. You have to use Kafka
>>> producer to ingest data into Kafka.
>>> 2. If you are going to write your own HDFS client to put data into HDFS
>>> then, you can read data from topic in step-1, validate and store into HDFS.
>>> 3. If you want to OpenSource tool (Gobbling or confluent Kafka HDFS
>>> connector) to put data into HDFS then
>>> Write tool to read data from topic, validate and store in other topic.
>>>
>>> We are using combination of these steps to process over 10 million
>>> events/second.
>>>
>>> I hope it helps..
>>>
>>> Thanks
>>> Mallan
>>>
>>> On Jun 30, 2017 10:31 AM, "Sidharth Kumar" <sidharthkumar2707@gmail.com>
>>> wrote:
>>>
>>>> Thanks! What about Kafka with Flume? And also I would like to tell that
>>>> everyday data intake is in millions and can't afford to loose even a single
>>>> piece of data. Which makes a need of  high availablity.
>>>>
>>>> Warm Regards
>>>>
>>>> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892 192
>>>> 367 |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 30-Jun-2017 10:04 AM, "JP gupta" <JP.Gupta@altruistindia.com> wrote:
>>>>
>>>>> The ideal sequence should be:
>>>>>
>>>>> 1.      Ingress using Kafka -> Validation and processing using Spark
>>>>> -> Write into any NoSql DB or Hive.
>>>>>
>>>>> From my recent experience, writing directly to HDFS can be slow
>>>>> depending on the data format.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> JP
>>>>>
>>>>>
>>>>>
>>>>> *From:* Sudeep Singh Thakur [mailto:sudeepthakur90@gmail.com]
>>>>> *Sent:* 30 June 2017 09:26
>>>>> *To:* Sidharth Kumar
>>>>> *Cc:* Maggy; common-user@hadoop.apache.org
>>>>> *Subject:* Re: Kafka or Flume
>>>>>
>>>>>
>>>>>
>>>>> In your use Kafka would be better because you want some
>>>>> transformations and validations.
>>>>>
>>>>> Kind regards,
>>>>> Sudeep Singh Thakur
>>>>>
>>>>>
>>>>>
>>>>> On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <sidharthkumar2707@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I have a requirement where I have all transactional data injestion
>>>>> into hadoop in real time and before storing the data into hadoop, process
>>>>> it to validate the data. If the data failed to pass validation process
, it
>>>>> will not be stored into hadoop. The validation process also make use
of
>>>>> historical data which is stored in hadoop. So, my question is which
>>>>> injestion tool will be best for this Kafka or Flume?
>>>>>
>>>>>
>>>>>
>>>>> Any suggestions will be a great help for me.
>>>>>
>>>>>
>>>>> Warm Regards
>>>>>
>>>>> Sidharth Kumar | Mob: +91 8197 555 599 <+91%2081975%2055599>/7892
192
>>>>> 367 |  LinkedIn:www.linkedin.com/in/sidharthkumar2792
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>

Mime
View raw message